Entrit/Qwen2.5-1.5B-trit-uniform-d4
Entrit/Qwen2.5-1.5B-trit-uniform-d4 is a 1.5 billion parameter Qwen2.5 model from Entrit, featuring balanced ternary post-training quantization. This model uses a depth-4 quantization scheme, resulting in 6.64 bits per weight, significantly reducing the information content of the model's weights. It is optimized for efficient inference on hardware capable of consuming packed trit formats, making it suitable for resource-constrained environments.
Loading preview...
Overview
Entrit/Qwen2.5-1.5B-trit-uniform-d4 is a quantized version of the Qwen/Qwen2.5-1.5B large language model, developed by Entrit. This model implements a balanced ternary post-training quantization (PTQ) method, as described in "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).
Key Quantization Details
- Source Model: Qwen/Qwen2.5-1.5B
- Quantization Depth: d=4, yielding 81 levels per weight.
- Bits per Weight: Achieves an effective 6.64 bits per weight, indicating substantial compression of the model's information content.
- Method: Uniform PTQ applied to all 2D linear matrices.
- Exclusions:
lm_head, token embeddings, and all*_normlayers are kept in FP16 for compatibility and performance. - Codec: Utilizes the
tritllm v2codec, available at Entrit/tritllm-codec.
Performance and Use Cases
While the weights are dequantized to FP16 for standard transformers compatibility (maintaining the same on-disk size as the FP16 source), the true benefit of this model lies in its reduced information content. This makes it particularly well-suited for inference on specialized hardware that can directly process the packed trit format, such as systems leveraging the Entrit/tritllm-kernel. Developers can load this model using standard Hugging Face transformers library calls.