Entrit/Qwen2.5-0.5B-trit-uniform-d1
Entrit/Qwen2.5-0.5B-trit-uniform-d1 is a 0.5 billion parameter Qwen2.5 model developed by Entrit, featuring balanced ternary post-training quantization at 1.88 bits per weight. This model is optimized for efficient inference on hardware supporting packed trit formats, offering significant memory and computational savings. It is derived from Qwen/Qwen2.5-0.5B and is suitable for applications requiring reduced model footprint and faster processing.
Loading preview...
Model Overview
Entrit/Qwen2.5-0.5B-trit-uniform-d1 is a quantized version of the Qwen/Qwen2.5-0.5B language model, developed by Entrit. This model employs balanced ternary post-training quantization at a depth of d=1, resulting in an efficient 1.88 bits per weight. The quantization process is based on the codec described in "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).
Key Quantization Details
- Source Model: Qwen/Qwen2.5-0.5B
- Quantization Method: Uniform Post-Training Quantization (PTQ)
- Bits per Weight: 1.88 bpw (3 levels per weight)
- Group Size: 16
- Quantized Layers: All 2D linear matrices
- FP16 Kept:
lm_head, token embeddings, and all*_normlayers remain in FP16 for compatibility and performance.
Performance and Compatibility
While the on-disk size is similar to the FP16 source due to dequantization for stock transformers compatibility, the 1.88 bpw figure represents the true information content. This makes the model particularly efficient for inference on specialized hardware that can directly consume the packed trit format, leveraging the Entrit/tritllm-kernel.
Use Cases
This model is ideal for scenarios where memory footprint reduction and accelerated inference are critical, especially on edge devices or systems with limited resources, provided the inference environment supports or can be adapted to handle the packed trit format efficiently.