Entrit/Qwen2.5-7B-trit-uniform-d3
Entrit/Qwen2.5-7B-trit-uniform-d3 is a 7.6 billion parameter language model developed by Entrit, based on the Qwen2.5-7B architecture. This model features balanced ternary post-training quantization (PTQ) at a depth of d=3, resulting in 5.05 bits per weight. It is optimized for efficient inference on hardware capable of consuming packed trit formats, offering a compact representation of the original Qwen2.5-7B model.
Loading preview...
Overview
Entrit/Qwen2.5-7B-trit-uniform-d3 is a quantized version of the Qwen/Qwen2.5-7B model, developed by Entrit Systems. It employs a novel balanced ternary post-training quantization (PTQ) method, as detailed in the paper "Balanced Ternary Post-Training Quantization for Large Language Models" by Eric Stentzel (2026).
Key Quantization Details
- Source Model: Qwen/Qwen2.5-7B
- Quantization Method: Uniform PTQ with a depth of d=3, yielding 27 levels per weight.
- Efficiency: Achieves an information content of 5.05 bits per weight, significantly reducing the model's effective size.
- Compatibility: While the on-disk size remains similar to the FP16 source due to dequantization for stock
transformerscompatibility, its true efficiency is realized with hardware that directly processes the packed trit format. - Quantized Layers: All 2D linear matrices are quantized, while
lm_head, token embeddings, and all*_normlayers are kept in FP16 for stability.
Use Cases
This model is particularly suitable for applications requiring reduced memory footprint and faster inference when deployed on specialized hardware capable of handling balanced ternary formats. It provides a compact representation of the powerful Qwen2.5-7B model, making it ideal for edge devices or environments with strict resource constraints.