Name: Entrit/Qwen2.5-72B-trit-uniform-d4 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: Entrit

Entrit/Qwen2.5-72B-trit-uniform-d4: Balanced Ternary Quantization

This model is a 72.7 billion parameter version of the Qwen2.5-72B base model, developed by Entrit Systems. It has undergone balanced ternary post-training quantization (PTQ) using a uniform method at a depth of d=4, resulting in 81 levels per weight and an effective information content of 6.64 bits per weight. This quantization is based on research presented in "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Key Quantization Details

Source Model: Qwen/Qwen2.5-72B
Quantization Method: Uniform PTQ, applied to all 2D linear matrices.
Depth: d=4 (81 levels), yielding 6.64 bits per weight.
Group Size: 16
FP16 Components: lm_head, token embeddings, and all *_norm layers remain in FP16 to preserve critical model components.
Codec: Utilizes tritllm v2 for the quantization process.

Performance and Use Cases

While the on-disk size matches the FP16 source due to dequantization for standard transformers compatibility, the 6.64-bpw figure is crucial for inference on specialized hardware that can directly process the packed trit format. This approach aims to reduce the memory footprint and computational requirements during inference, making it suitable for deploying large language models more efficiently. The quantization process and codec are reproducible using the provided tritllm-codec repository.

Overview

Entrit/Qwen2.5-72B-trit-uniform-d4: Balanced Ternary Quantization

Key Quantization Details

Performance and Use Cases

Full Model Card (README)