d-matrix/Llama-3.2-1B

Warm
Public
1B
BF16
32768
Oct 24, 2024
Hugging Face
Overview

d-matrix/Llama-3.2-1B Overview

d-matrix/Llama-3.2-1B is a 1 billion parameter functional reference model provided by d-Matrix. It serves as a demonstration and evaluation platform for the d-Matrix Dmx_Compressor, specifically showcasing its application to the Llama 3.2-1B architecture.

Key Configurations

The model offers two primary functional configurations:

  • BASELINE: This configuration is designed to be functionally equivalent to the original Llama 3.2-1B model, providing a direct reference point.
  • BASIC: In this configuration, all linear algebraic operands within the model are quantized to MXINT8-64. This allows developers to assess the performance and functional impact of d-Matrix's quantization techniques.

Usage and Evaluation

To utilize this model, users need to install the d-Matrix Dmx_Compressor library. The README provides a Python example demonstrating how to load the model using DmxModel.from_torch and integrate it with the lm-evaluation-harness for evaluation tasks, such as wikitext. This setup facilitates direct comparison between the baseline and quantized versions.

Intended Use

This model is primarily intended for developers and researchers interested in:

  • Understanding the functional behavior of Llama 3.2-1B under different quantization schemes.
  • Evaluating the impact of d-Matrix's MXINT8-64 quantization on model performance and accuracy.
  • Experimenting with the Dmx_Compressor toolchain for model optimization.