Entrit/Qwen2.5-0.5B-trit-uniform-d2
Entrit/Qwen2.5-0.5B-trit-uniform-d2 is a 0.5 billion parameter language model based on the Qwen2.5 architecture, developed by Entrit. This model features balanced ternary post-training quantization (PTQ) at depth 2, resulting in an information content of 3.47 bits per weight. It is specifically designed for efficient inference on hardware that supports packed trit formats, offering a compact representation while maintaining compatibility with standard transformers for dequantized FP16 weights.
Loading preview...
Model Overview
Entrit/Qwen2.5-0.5B-trit-uniform-d2 is a 0.5 billion parameter model derived from the Qwen2.5-0.5B base model. Its primary distinction lies in its balanced ternary post-training quantization (PTQ), developed by Eric Stentzel at Entrit Systems. This quantization scheme uses a depth of 2, meaning 9 levels per weight, which translates to an information content of 3.47 bits per weight.
Key Quantization Details
- Source Model: Qwen/Qwen2.5-0.5B
- Quantization Method: Uniform PTQ with a group size of 16.
- Bits per Weight: 3.47, indicating a highly compressed representation.
- Quantized Layers: All 2D linear matrices are quantized.
- FP16 Layers:
lm_head, token embeddings, and all*_normlayers remain in FP16 for precision. - Codec: Utilizes the
tritllm v2codec, available via Entrit/tritllm-codec.
Performance and Use Cases
While the model's weights are dequantized to FP16 for compatibility with stock transformers (resulting in the same on-disk size as the FP16 source), its true efficiency benefit is realized when deployed on hardware capable of consuming the packed trit format directly. This makes it particularly suitable for:
- Resource-constrained environments: Where memory footprint and computational efficiency are critical.
- Specialized hardware: Designed to leverage ternary or low-bit quantization for faster inference.
This model represents an exploration into highly efficient model deployment through advanced quantization techniques, as detailed in the forthcoming paper "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).