Entrit/Qwen2.5-7B-trit-uniform-d4
Entrit/Qwen2.5-7B-trit-uniform-d4 is a 7.6 billion parameter language model based on Qwen/Qwen2.5-7B, featuring balanced ternary post-training quantization at a depth of d=4, resulting in 6.64 bits per weight. Developed by Entrit Systems, this model utilizes a uniform PTQ method for efficient inference. It is optimized for scenarios requiring reduced memory footprint and faster processing through quantized weights, while maintaining compatibility with standard transformers libraries by dequantizing to FP16.
Loading preview...
Entrit/Qwen2.5-7B-trit-uniform-d4: Quantized Qwen2.5-7B
This model is a balanced ternary post-training quantized version of the 7.6 billion parameter Qwen/Qwen2.5-7B model. Developed by Entrit Systems, it employs a novel quantization codec described in the paper "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).
Key Quantization Details
- Quantization Method: Uniform Post-Training Quantization (PTQ).
- Depth: d=4, meaning 81 levels per weight.
- Bits per Weight: Achieves an information content of 6.64 bits per weight, significantly reducing the effective size of the model's matrices.
- Source Model: Based on the robust Qwen2.5-7B architecture.
- Codec: Utilizes
tritllm v2for the quantization process. - Layers Quantized: All 2D linear matrices are quantized, while
lm_head, token embeddings, and all*_normlayers remain in FP16 for stability.
Performance and Compatibility
While the on-disk size is equivalent to the FP16 source due to dequantization for transformers compatibility, the 6.64-bpw figure represents the true information content. This makes the model particularly suitable for inference on specialized hardware that can directly process the packed trit format, leveraging the reduced bit-width for improved efficiency. The associated codec and kernel are available via Entrit/tritllm-codec and Entrit/tritllm-kernel respectively.