d-matrix/gemma-2b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2.6BQuant:BF16Ctx Length:8kArchitecture:Transformer Warm

d-matrix/gemma-2b is a 2.6 billion parameter functional reference of the Gemma model, provided by d-Matrix. This model offers configurations including a baseline equivalent to the original Gemma-2B and a 'BASIC' version with linear algebraic operands quantized to MXINT8-64. It is designed for evaluating the impact of quantization and approximated kernel simulations on model performance.

Loading preview...

d-matrix/gemma-2b: A Quantized Gemma-2B Reference

d-matrix/gemma-2b is a 2.6 billion parameter functional reference model based on the original Gemma-2B architecture. Developed by d-Matrix, this model is specifically designed to showcase and evaluate the effects of quantization and approximated kernel simulations on large language models.

Key Configurations

The model is provided in two primary functional configurations:

  • BASELINE: This configuration is functionally equivalent to the original Gemma-2B model, serving as a direct reference point.
  • BASIC: In this configuration, all linear algebraic operands are quantized to MXINT8-64, and other operations are transformed into approximated kernel simulations. This allows for direct comparison and analysis of performance under quantization.

Usage and Evaluation

To utilize and evaluate d-matrix/gemma-2b, users need to install the d-Matrix Dmx_Compressor library. The model can then be loaded and integrated with evaluation frameworks like lm-evaluation-harness to assess its performance, particularly focusing on the impact of the MXINT8-64 quantization and approximated kernels. This setup facilitates research and development in efficient model deployment and hardware-aware optimization.