Entrit/Qwen2.5-1.5B-trit-uniform-d1
Entrit/Qwen2.5-1.5B-trit-uniform-d1 is a 1.5 billion parameter causal language model from Entrit, based on Qwen/Qwen2.5-1.5B, featuring balanced ternary post-training quantization. It uses 1.88 bits per weight, significantly reducing the information content of the model's weights. This model is optimized for efficient inference on specialized hardware that can directly consume its packed trit format, making it suitable for resource-constrained environments.
Loading preview...
Entrit/Qwen2.5-1.5B-trit-uniform-d1: A Quantized Qwen2.5 Model
This model is a 1.5 billion parameter variant of the Qwen/Qwen2.5-1.5B architecture, developed by Entrit. Its primary distinguishing feature is the application of balanced ternary post-training quantization (PTQ), reducing the model's weight representation to 1.88 bits per weight (bpw).
Key Quantization Details
- Source Model: Qwen/Qwen2.5-1.5B
- Quantization Method: Uniform PTQ with a depth of d=1 (3 levels per weight).
- Bits per Weight: Achieves 1.88 bpw, indicating a highly compressed weight representation.
- Codec: Utilizes the
tritllm v2codec, as detailed in the paper "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026). - Selective Quantization: All 2D linear matrices are quantized, while
lm_head, token embeddings, and*_normlayers are kept in FP16 for performance and stability.
Performance and Use Cases
While the on-disk size remains similar to the FP16 source due to dequantization for standard transformers compatibility, the 1.88 bpw figure highlights its potential for highly efficient inference on hardware designed to process the packed trit format directly. This makes it particularly relevant for applications requiring reduced memory footprint and faster computation when paired with compatible inference kernels (e.g., Entrit/tritllm-kernel). Developers interested in exploring extreme quantization for resource-constrained deployments should consider this model.