Entrit/Qwen2.5-7B-trit-uniform-d2
Entrit/Qwen2.5-7B-trit-uniform-d2 is a 7.6 billion parameter language model based on the Qwen2.5-7B architecture, developed by Entrit Systems. This model features balanced ternary post-training quantization at a depth of d=2, resulting in 3.47 bits per weight. It is specifically designed for efficient inference on hardware that can process packed trit formats, offering a highly compressed representation of the original Qwen2.5-7B model.
Loading preview...
Overview
Entrit/Qwen2.5-7B-trit-uniform-d2 is a quantized version of the Qwen/Qwen2.5-7B large language model, developed by Entrit Systems. This model implements a balanced ternary post-training quantization (PTQ) scheme, achieving a high compression ratio with 3.47 bits per weight. The quantization process, detailed in the paper "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026), uses a depth of d=2, resulting in 9 levels per weight.
Key Quantization Details
- Source Model: Qwen/Qwen2.5-7B
- Quantization Method: Uniform PTQ with a balanced ternary codec (tritllm v2).
- Bits per Weight: 3.47, representing the information content of the quantized matrices.
- Quantized Layers: All 2D linear matrices are quantized, while
lm_head, token embeddings, and*_normlayers remain in FP16 for compatibility and performance. - Group Size: 16, with a 27-entry log-spaced scale codebook.
Usage and Performance
While the on-disk size remains similar to the FP16 source due to dequantization for standard transformers compatibility, the true benefit of this model lies in its optimized format for hardware capable of directly consuming packed trit data. This makes it particularly suitable for scenarios requiring highly efficient inference with reduced memory footprint at the hardware level. The model can be loaded using standard transformers library functions, with weights dequantized to FP16 during loading.