Name: Entrit/Qwen2.5-32B-trit-uniform-d2 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: Entrit

Overview

Entrit/Qwen2.5-32B-trit-uniform-d2 is a 32.8 billion parameter large language model derived from Qwen/Qwen2.5-32B. Developed by Entrit, this model implements a balanced ternary post-training quantization (PTQ) scheme, as detailed in the paper "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Key Quantization Details

Source Model: Qwen/Qwen2.5-32B
Quantization Method: Uniform PTQ with a depth of d=2, yielding 9 levels per weight.
Bits per Weight: Achieves an information content of 3.47 bits per weight for quantized matrices.
Quantized Components: All 2D linear matrices are ternary-quantized.
FP16 Components: lm_head, token embeddings, and all *_norm layers remain in FP16 for compatibility and performance.
Codec: Utilizes the tritllm v2 codec, available in the Entrit/tritllm-codec repository.

Performance and Usage

While the on-disk size matches the FP16 source due to dequantization for standard transformers compatibility, the 3.47-bpw figure is crucial for inference on specialized hardware that can process the packed trit format directly. This model is designed for scenarios where reduced memory footprint and potentially faster inference (with compatible hardware) are critical, without significant loss in model capability compared to its FP16 base.

Overview

Overview

Key Quantization Details

Performance and Usage

Full Model Card (README)