Entrit/Qwen2.5-3B-trit-uniform-d2
Entrit/Qwen2.5-3B-trit-uniform-d2 is a 3.1 billion parameter Qwen2.5-3B model that has undergone balanced ternary post-training quantization (PTQ) by Entrit Systems. This quantization reduces the model's weights to 3.47 bits per weight, using 9 levels per weight at a depth of d=2. It is optimized for efficient inference on hardware capable of consuming packed trit format, making it suitable for resource-constrained environments.
Loading preview...
Entrit/Qwen2.5-3B-trit-uniform-d2: Balanced Ternary Quantization
This model is a 3.1 billion parameter variant of the Qwen2.5-3B architecture, developed by Entrit Systems. It features a balanced ternary post-training quantization (PTQ) scheme, reducing its weights to an information content of 3.47 bits per weight with 9 levels per weight (depth d=2). This quantization is based on the codec described in "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).
Key Quantization Details
- Source Model: Qwen/Qwen2.5-3B
- Quantization Method: Uniform PTQ
- Bits per Weight: 3.47
- Depth: d=2 (9 levels)
- Quantized Layers: All 2D linear matrices
- FP16 Kept:
lm_head, token embeddings, and all*_normlayers
While the on-disk size remains similar to the FP16 source due to dequantization for stock transformers compatibility, the 3.47-bpw figure highlights its efficiency for specialized hardware that can directly process the packed trit format. The tritllm-codec and tritllm-kernel projects provide the underlying technology for this quantization and efficient inference.
Good for
- Deploying Qwen2.5-3B in environments requiring reduced memory footprint and faster inference with specialized hardware.
- Research and development in efficient LLM quantization techniques, particularly balanced ternary methods.
- Applications where the trade-off between model size/speed and minor performance degradation from quantization is acceptable.