d-matrix/gemma-2b: A Quantized Gemma-2B Reference
d-matrix/gemma-2b is a 2.6 billion parameter functional reference model based on the original Gemma-2B architecture. Developed by d-Matrix, this model is specifically designed to showcase and evaluate the effects of quantization and approximated kernel simulations on large language models.
Key Configurations
The model is provided in two primary functional configurations:
BASELINE: This configuration is functionally equivalent to the original Gemma-2B model, serving as a direct reference point.BASIC: In this configuration, all linear algebraic operands are quantized to MXINT8-64, and other operations are transformed into approximated kernel simulations. This allows for direct comparison and analysis of performance under quantization.
Usage and Evaluation
To utilize and evaluate d-matrix/gemma-2b, users need to install the d-Matrix Dmx_Compressor library. The model can then be loaded and integrated with evaluation frameworks like lm-evaluation-harness to assess its performance, particularly focusing on the impact of the MXINT8-64 quantization and approximated kernels. This setup facilitates research and development in efficient model deployment and hardware-aware optimization.