Name: Entrit/Qwen2.5-1.5B-trit-uniform-d3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Overview

Entrit/Qwen2.5-1.5B-trit-uniform-d3 is a 1.5 billion parameter language model based on the Qwen2.5 architecture, developed by Entrit. Its key innovation lies in its balanced ternary post-training quantization (PTQ), achieved at a depth of 3 (27 levels per weight), which translates to an information content of 5.05 bits per weight. This quantization was performed using the codec described in "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Key Quantization Details

Source Model: Qwen/Qwen2.5-1.5B
Quantization Depth: d=3 (27 levels)
Bits per Weight: 5.05
Method: Uniform PTQ applied to all 2D linear matrices.
Excluded Layers: lm_head, token embeddings, and all *_norm layers remain in FP16.

Unique Characteristics

While the model's weights are dequantized to FP16 for compatibility with standard transformers libraries, its true efficiency is realized when deployed on hardware capable of processing the packed trit format directly. This allows for a substantial reduction in the information content of the model, making it highly efficient for inference in environments where memory footprint and computational speed are critical. The on-disk size remains similar to the FP16 source due to dequantization for transformers compatibility, but the underlying quantized representation is significantly more compact.

Overview

Overview

Key Quantization Details

Unique Characteristics

Full Model Card (README)