Entrit/Qwen2.5-3B-trit-uniform-d1
Entrit/Qwen2.5-3B-trit-uniform-d1 is a 3.1 billion parameter language model based on the Qwen2.5-3B architecture, featuring balanced ternary post-training quantization. Developed by Entrit, this model uses a 1.88 bits-per-weight scheme, significantly reducing the information content of its weights. It is designed for efficient inference on specialized hardware that can directly consume its packed trit format, making it suitable for resource-constrained environments.
Loading preview...
Model Overview
Entrit/Qwen2.5-3B-trit-uniform-d1 is a quantized version of the Qwen/Qwen2.5-3B large language model, developed by Entrit. This model employs a novel balanced ternary post-training quantization (PTQ) method, as described in "Balanced Ternary Post-Training Quantization for Large Language Models" by Stentzel (2026).
Key Quantization Details
- Source Model: Qwen/Qwen2.5-3B
- Quantization Depth: d=1 (3 levels per weight)
- Bits per Weight (BPW): 1.88 BPW, indicating a substantial reduction in information content compared to standard FP16 models.
- Method: Uniform PTQ applied to all 2D linear matrices.
- Codec: Utilizes
tritllm v2for quantization, with the source available at Entrit/tritllm-codec.
Performance and Usage
While the model's weights are dequantized to FP16 for compatibility with standard transformers libraries, its true efficiency benefit lies in its reduced information content. This makes it particularly advantageous for inference on hardware specifically designed to process the packed trit format directly, leveraging kernels like those found in Entrit/tritllm-kernel.
Good for
- Resource-constrained inference: Ideal for deployments where memory footprint and computational efficiency are critical, provided compatible hardware is used.
- Research into quantization techniques: Offers a practical example of balanced ternary quantization for LLMs.
- Exploring novel hardware acceleration: Suitable for use cases targeting specialized hardware that can natively process ternary weights.