d-matrix/Llama3-8b

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Nov 11, 2024Architecture:Transformer Cold

d-matrix/Llama3-8b is an 8 billion parameter functional reference of the Llama 3 model, provided by d-Matrix. It includes configurations for a baseline functionally equivalent to the original model and a basic version with all linear algebraic operands quantized to MXINT8-64. This model is designed for evaluating the impact of d-Matrix's Dmx_Compressor on Llama 3's performance and functionality.

Loading preview...

d-matrix/Llama3-8b Overview

d-matrix/Llama3-8b is an 8 billion parameter functional reference of the Llama 3 model, developed by d-Matrix. This model serves as a reference implementation to demonstrate and evaluate the capabilities of the d-Matrix Dmx_Compressor.

Key Configurations

The model is provided with two primary functional configurations:

  • BASELINE: This configuration is functionally equivalent to the original Llama 3 model, serving as an unquantized reference.
  • BASIC: In this configuration, all linear algebraic operands within the model are quantized to MXINT8-64, showcasing the effects of d-Matrix's compression technology.

Usage and Evaluation

Developers can integrate this model with the d-Matrix Dmx_Compressor to transform and evaluate its performance. The provided example demonstrates how to load the model using dmx.compressor.modeling.DmxModel.from_torch and evaluate it with lm_eval on tasks like "wikitext", allowing for direct comparison between the baseline and quantized versions.