Entrit/Qwen2.5-32B-trit-uniform-d3
Entrit/Qwen2.5-32B-trit-uniform-d3 is a 32.8 billion parameter Qwen2.5 model from Entrit Systems, featuring balanced ternary post-training quantization (PTQ) at a depth of d=3. This quantization method achieves 5.05 bits per weight, significantly reducing the information content of the model's weights. It is optimized for efficient inference on hardware supporting packed trit formats, while maintaining compatibility with standard transformers by dequantizing to FP16 for general use. This model is ideal for applications requiring reduced memory footprint and faster inference with minimal performance degradation.
Loading preview...
Entrit/Qwen2.5-32B-trit-uniform-d3: Balanced Ternary Quantization
This model is a 32.8 billion parameter variant of the Qwen2.5-32B architecture, developed by Entrit Systems. It implements a novel balanced ternary post-training quantization (PTQ) scheme, as detailed in the paper "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).
Key Quantization Details
- Quantization Method: Uniform PTQ applied to all 2D linear matrices.
- Bits Per Weight (BPW): Achieves an effective 5.05 BPW, representing the information content of the quantized weights.
- Depth: d=3, corresponding to 27 levels per weight.
- Group Size: 16.
- Codec: Utilizes the
tritllm v2codec, available in the Entrit/tritllm-codec repository. - FP16 Preservation: Key components like
lm_head, token embeddings, and all*_normlayers remain in FP16 to preserve model integrity.
Performance and Use Cases
While the on-disk size matches the FP16 source due to dequantization for transformers compatibility, the 5.05 BPW is crucial for hardware designed to consume packed trit formats directly, enabling more efficient inference. This model is particularly suited for scenarios where memory footprint and inference speed are critical, leveraging the reduced information content of its weights without requiring specialized hardware for basic operation. For full evaluation results and technical specifics, refer to the associated paper.