d-matrix/Llama3-8b
d-matrix/Llama3-8b is an 8 billion parameter functional reference of the Llama 3 model, provided by d-Matrix. It includes configurations for a baseline functionally equivalent to the original model and a basic version with all linear algebraic operands quantized to MXINT8-64. This model is designed for evaluating the impact of d-Matrix's Dmx_Compressor on Llama 3's performance and functionality.
Loading preview...
d-matrix/Llama3-8b Overview
d-matrix/Llama3-8b is an 8 billion parameter functional reference of the Llama 3 model, developed by d-Matrix. This model serves as a reference implementation to demonstrate and evaluate the capabilities of the d-Matrix Dmx_Compressor.
Key Configurations
The model is provided with two primary functional configurations:
BASELINE: This configuration is functionally equivalent to the original Llama 3 model, serving as an unquantized reference.BASIC: In this configuration, all linear algebraic operands within the model are quantized toMXINT8-64, showcasing the effects of d-Matrix's compression technology.
Usage and Evaluation
Developers can integrate this model with the d-Matrix Dmx_Compressor to transform and evaluate its performance. The provided example demonstrates how to load the model using dmx.compressor.modeling.DmxModel.from_torch and evaluate it with lm_eval on tasks like "wikitext", allowing for direct comparison between the baseline and quantized versions.