Name: Entrit/Qwen2.5-3B-trit-uniform-d2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Entrit/Qwen2.5-3B-trit-uniform-d2: Balanced Ternary Quantization

This model is a 3.1 billion parameter variant of the Qwen2.5-3B architecture, developed by Entrit Systems. It features a balanced ternary post-training quantization (PTQ) scheme, reducing its weights to an information content of 3.47 bits per weight with 9 levels per weight (depth d=2). This quantization is based on the codec described in "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Key Quantization Details

Source Model: Qwen/Qwen2.5-3B
Quantization Method: Uniform PTQ
Bits per Weight: 3.47
Depth: d=2 (9 levels)
Quantized Layers: All 2D linear matrices
FP16 Kept: lm_head, token embeddings, and all *_norm layers

While the on-disk size remains similar to the FP16 source due to dequantization for stock transformers compatibility, the 3.47-bpw figure highlights its efficiency for specialized hardware that can directly process the packed trit format. The tritllm-codec and tritllm-kernel projects provide the underlying technology for this quantization and efficient inference.

Good for

Deploying Qwen2.5-3B in environments requiring reduced memory footprint and faster inference with specialized hardware.
Research and development in efficient LLM quantization techniques, particularly balanced ternary methods.
Applications where the trade-off between model size/speed and minor performance degradation from quantization is acceptable.

Overview

Entrit/Qwen2.5-3B-trit-uniform-d2: Balanced Ternary Quantization

Key Quantization Details

Good for

Full Model Card (README)