Name: Entrit/Mistral-7B-v0.3-trit-uniform-d2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Entrit/Mistral-7B-v0.3-trit-uniform-d2: Quantized Mistral-7B-v0.3

This model is a 7 billion parameter variant of mistralai/Mistral-7B-v0.3, featuring balanced ternary post-training quantization (PTQ). Developed by Entrit Systems, it implements a depth-2 quantization, meaning each weight uses 9 levels, achieving an information density of 3.47 bits per weight.

Key Quantization Details

Source Model: mistralai/Mistral-7B-v0.3
Quantization Method: Uniform PTQ with a depth of 2 (9 levels per weight).
Bits per Weight: 3.47, indicating significant compression of the model's information content.
Codec: Produced using the tritllm v2 codec, detailed in the Entrit/tritllm-codec repository.
Quantized Layers: All 2D linear matrices are quantized.
FP16 Layers: lm_head, token embeddings, and all *_norm layers remain in FP16 for compatibility and performance.

Performance and Usage

While the on-disk size matches the FP16 source due to dequantization for stock transformers compatibility, the 3.47-bpw figure is crucial for specialized hardware. This model is designed for efficient inference when used with hardware that can directly consume the packed trit format, such as through the Entrit/tritllm-kernel.

Citation

This quantization method is based on the work "Balanced Ternary Post-Training Quantization for Large Language Models" by Eric Stentzel (2026).

Overview

Entrit/Mistral-7B-v0.3-trit-uniform-d2: Quantized Mistral-7B-v0.3

Key Quantization Details

Performance and Usage

Citation

Full Model Card (README)