Entrit/Mistral-7B-v0.3-trit-uniform-d1
Entrit/Mistral-7B-v0.3-trit-uniform-d1 is a 7 billion parameter language model, a balanced ternary quantized version of Mistral-7B-v0.3. Developed by Entrit Systems, it uses a novel post-training quantization method achieving 1.88 bits per weight. This model is optimized for efficient inference on hardware capable of consuming packed trit format, offering a highly compressed representation of the original Mistral-7B-v0.3.
Loading preview...
Overview
Entrit/Mistral-7B-v0.3-trit-uniform-d1 is a 7 billion parameter language model derived from mistralai/Mistral-7B-v0.3 through balanced ternary post-training quantization. This model, developed by Entrit Systems, implements a unique quantization scheme achieving 1.88 bits per weight at a depth of d=1 (3 levels per weight).
Key Quantization Details
This model utilizes the tritllm-codec (v2) for quantization, as detailed in the paper "Balanced Ternary Post-Training Quantization for Large Language Models" by Stentzel (2026). Key specifications include:
- Source Model:
mistralai/Mistral-7B-v0.3 - Quantization Depth: d=1 (3 levels)
- Bits per Weight: 1.88
- Method: Uniform Post-Training Quantization (PTQ)
- Quantized Layers: All 2D linear matrices are quantized.
- FP16 Kept:
lm_head, token embeddings, and all*_normlayers remain in FP16 for compatibility.
Unique Differentiator
While the weights are dequantized to FP16 for standard transformers compatibility (resulting in an on-disk size similar to the FP16 source), the 1.88-bpw figure highlights its information content. This makes it particularly relevant for specialized hardware that can directly process the packed trit format, enabling highly efficient inference. The model's design focuses on significant weight compression while maintaining performance, making it a strong candidate for resource-constrained environments or edge deployments when paired with compatible inference kernels.