Entrit/Qwen2.5-1.5B-trit-uniform-d2
Entrit/Qwen2.5-1.5B-trit-uniform-d2 is a 1.5 billion parameter language model based on the Qwen2.5 architecture, developed by Entrit Systems. This model features balanced ternary post-training quantization at a depth of d=2, resulting in an information content of 3.47 bits per weight. It is optimized for efficient inference on hardware capable of consuming packed trit formats, making it suitable for resource-constrained environments.
Loading preview...
Entrit/Qwen2.5-1.5B-trit-uniform-d2: A Quantized Qwen2.5 Model
This model is a 1.5 billion parameter variant of the Qwen2.5 architecture, developed by Entrit Systems. Its primary distinction lies in its balanced ternary post-training quantization (PTQ), achieved at a depth of d=2, which translates to 9 levels per weight and an information content of 3.47 bits per weight. This quantization was performed using the codec from "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).
Key Capabilities and Features
- Efficient Quantization: Utilizes a uniform PTQ method with a group size of 16 and a 27-entry log-spaced scale codebook.
- Reduced Information Content: Achieves 3.47 bits per weight, making it highly efficient for specialized hardware that can process packed trit formats directly.
transformersCompatibility: While the on-disk size is similar to the FP16 source due to dequantization to FP16 for standardtransformerslibrary compatibility, its core efficiency is in its quantized representation.- Targeted Quantization: All 2D linear matrices are quantized, while
lm_head, token embeddings, and all*_normlayers are kept in FP16 to preserve critical model components.
When to Use This Model
This model is particularly well-suited for applications where memory footprint and inference speed on specialized hardware are critical. Developers looking to experiment with or deploy models leveraging balanced ternary quantization for improved efficiency will find this model valuable. It provides a quantized version of the Qwen2.5-1.5B base model, offering a balance between performance and resource optimization.