Name: Entrit/Qwen2.5-32B-trit-uniform-d3 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: Entrit

Entrit/Qwen2.5-32B-trit-uniform-d3: Balanced Ternary Quantization

This model is a 32.8 billion parameter variant of the Qwen2.5-32B architecture, developed by Entrit Systems. It implements a novel balanced ternary post-training quantization (PTQ) scheme, as detailed in the paper "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Key Quantization Details

Quantization Method: Uniform PTQ applied to all 2D linear matrices.
Bits Per Weight (BPW): Achieves an effective 5.05 BPW, representing the information content of the quantized weights.
Depth: d=3, corresponding to 27 levels per weight.
Group Size: 16.
Codec: Utilizes the tritllm v2 codec, available in the Entrit/tritllm-codec repository.
FP16 Preservation: Key components like lm_head, token embeddings, and all *_norm layers remain in FP16 to preserve model integrity.

Performance and Use Cases

While the on-disk size matches the FP16 source due to dequantization for transformers compatibility, the 5.05 BPW is crucial for hardware designed to consume packed trit formats directly, enabling more efficient inference. This model is particularly suited for scenarios where memory footprint and inference speed are critical, leveraging the reduced information content of its weights without requiring specialized hardware for basic operation. For full evaluation results and technical specifics, refer to the associated paper.

Overview

Entrit/Qwen2.5-32B-trit-uniform-d3: Balanced Ternary Quantization

Key Quantization Details

Performance and Use Cases

Full Model Card (README)