Name: Entrit/Qwen2.5-72B-trit-uniform-d3 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: Entrit

Entrit/Qwen2.5-72B-trit-uniform-d3: Balanced Ternary Quantization

This model is a 72.7 billion parameter variant of the Qwen/Qwen2.5-72B base model, developed by Entrit Systems. It features balanced ternary post-training quantization (PTQ), a technique designed to significantly reduce model size and improve inference efficiency, particularly on specialized hardware. The quantization process is detailed in the paper "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Key Quantization Details

Source Model: Qwen/Qwen2.5-72B.
Quantization Method: Uniform PTQ with a depth of d=3, resulting in 27 levels per weight.
Bits per Weight (BPW): Achieves an information content of 5.05 BPW for the quantized matrices.
Quantized Layers: All 2D linear weight matrices are ternary-quantized.
FP16 Layers: lm_head, token embeddings, and all *_norm layers are kept in FP16 to preserve critical model components.
Codec: Utilizes the tritllm v2 codec, available in the Entrit/tritllm-codec repository.

Usage and Performance

While the on-disk size remains similar to the FP16 source due to dequantization for standard transformers compatibility, the 5.05 BPW figure highlights its potential for highly efficient inference when deployed on hardware capable of directly processing the packed trit format. This makes it a strong candidate for applications requiring reduced memory footprint and faster computation with minimal accuracy loss compared to its full-precision counterpart.

Overview

Entrit/Qwen2.5-72B-trit-uniform-d3: Balanced Ternary Quantization

Key Quantization Details

Usage and Performance

Full Model Card (README)