Name: Entrit/Qwen2.5-72B-trit-uniform-d2 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: Entrit

Model Overview

Entrit/Qwen2.5-72B-trit-uniform-d2 is a 72.7 billion parameter language model derived from Qwen/Qwen2.5-72B, featuring balanced ternary post-training quantization (PTQ). This quantization scheme, based on research by Stentzel (2026), reduces the model's information content to 3.47 bits per weight across its 2D linear matrices, using 9 levels per weight (depth d=2).

Key Quantization Details

Source Model: Qwen/Qwen2.5-72B
Quantization Method: Uniform PTQ
Bits per Weight: 3.47 bpw (information content)
Quantized Layers: All 2D linear matrices
FP16 Layers: lm_head, token embeddings, and *_norm layers remain in FP16 for compatibility and performance.

Unique Characteristics

While the on-disk size remains similar to the FP16 source due to dequantization for standard transformers compatibility, the 3.47-bpw figure is crucial for hardware acceleration. This model is designed to leverage specialized hardware that can directly process the packed trit format, enabling more efficient inference than traditional FP16 models. The quantization process follows standard conventions, focusing on the most memory-intensive parts of the model.

Reproducibility

The quantization process is fully reproducible using the Entrit/tritllm-codec repository, allowing users to verify the methodology and results.

Overview

Model Overview

Key Quantization Details

Unique Characteristics

Reproducibility

Full Model Card (README)