Entrit/Mistral-7B-v0.3-trit-uniform-d2
Entrit/Mistral-7B-v0.3-trit-uniform-d2 is a 7 billion parameter language model, a balanced ternary quantized version of Mistral-7B-v0.3. It uses a depth-2 quantization scheme, resulting in 3.47 bits per weight, significantly reducing the information content of the model's matrices. This model is optimized for efficient inference on hardware capable of processing packed trit formats, offering a compact alternative to its FP16 source.
Loading preview...
Entrit/Mistral-7B-v0.3-trit-uniform-d2: Quantized Mistral-7B-v0.3
This model is a 7 billion parameter variant of mistralai/Mistral-7B-v0.3, featuring balanced ternary post-training quantization (PTQ). Developed by Entrit Systems, it implements a depth-2 quantization, meaning each weight uses 9 levels, achieving an information density of 3.47 bits per weight.
Key Quantization Details
- Source Model:
mistralai/Mistral-7B-v0.3 - Quantization Method: Uniform PTQ with a depth of 2 (9 levels per weight).
- Bits per Weight: 3.47, indicating significant compression of the model's information content.
- Codec: Produced using the
tritllm v2codec, detailed in the Entrit/tritllm-codec repository. - Quantized Layers: All 2D linear matrices are quantized.
- FP16 Layers:
lm_head, token embeddings, and all*_normlayers remain in FP16 for compatibility and performance.
Performance and Usage
While the on-disk size matches the FP16 source due to dequantization for stock transformers compatibility, the 3.47-bpw figure is crucial for specialized hardware. This model is designed for efficient inference when used with hardware that can directly consume the packed trit format, such as through the Entrit/tritllm-kernel.
Citation
This quantization method is based on the work "Balanced Ternary Post-Training Quantization for Large Language Models" by Eric Stentzel (2026).