Entrit/Qwen2.5-0.5B-trit-uniform-d4
Entrit/Qwen2.5-0.5B-trit-uniform-d4 is a 0.5 billion parameter Qwen2.5-based causal language model developed by Entrit. This model features balanced ternary post-training quantization at a depth of 4, resulting in 6.64 bits per weight. It is optimized for efficient inference on hardware that can directly consume its packed trit format, offering a quantized version of the original Qwen2.5-0.5B.
Loading preview...
Overview
Entrit/Qwen2.5-0.5B-trit-uniform-d4 is a quantized version of the Qwen/Qwen2.5-0.5B model, developed by Entrit. This model utilizes balanced ternary post-training quantization (PTQ) with a depth of 4, achieving an information content of 6.64 bits per weight. This quantization method is based on research presented in "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).
Key Quantization Details
- Source Model: Qwen/Qwen2.5-0.5B
- Quantization Method: Uniform PTQ with a depth of 4 (81 levels per weight).
- Bits per Weight: 6.64, indicating significant compression compared to standard FP16.
- Quantized Layers: All 2D linear matrices are quantized.
- FP16 Layers:
lm_head, token embeddings, and all*_normlayers remain in FP16 for compatibility and performance. - Codec: Uses
tritllm v2for the quantization process.
Usage and Performance Considerations
While the model's on-disk size is similar to its FP16 source due to dequantization for stock transformers compatibility, its true efficiency is realized when used with hardware and kernels designed for packed trit formats (e.g., Entrit/tritllm-kernel). This makes it particularly suitable for scenarios where reduced memory footprint and faster inference with specialized hardware are critical.