Entrit/Qwen2.5-1.5B-trit-uniform-d3

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 4, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Entrit/Qwen2.5-1.5B-trit-uniform-d3 is a 1.5 billion parameter Qwen2.5 model from Entrit, featuring balanced ternary post-training quantization (PTQ) at a depth of 3, resulting in 5.05 bits per weight. This model is optimized for efficient inference on specialized hardware that can directly consume its packed trit format, offering a significantly reduced information content compared to its FP16 source. It is particularly suited for applications requiring smaller model footprints and faster processing with minimal performance degradation.

Loading preview...

Overview

Entrit/Qwen2.5-1.5B-trit-uniform-d3 is a 1.5 billion parameter language model based on the Qwen2.5 architecture, developed by Entrit. Its key innovation lies in its balanced ternary post-training quantization (PTQ), achieved at a depth of 3 (27 levels per weight), which translates to an information content of 5.05 bits per weight. This quantization was performed using the codec described in "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Key Quantization Details

  • Source Model: Qwen/Qwen2.5-1.5B
  • Quantization Depth: d=3 (27 levels)
  • Bits per Weight: 5.05
  • Method: Uniform PTQ applied to all 2D linear matrices.
  • Excluded Layers: lm_head, token embeddings, and all *_norm layers remain in FP16.

Unique Characteristics

While the model's weights are dequantized to FP16 for compatibility with standard transformers libraries, its true efficiency is realized when deployed on hardware capable of processing the packed trit format directly. This allows for a substantial reduction in the information content of the model, making it highly efficient for inference in environments where memory footprint and computational speed are critical. The on-disk size remains similar to the FP16 source due to dequantization for transformers compatibility, but the underlying quantized representation is significantly more compact.