Name: Entrit/Qwen2.5-7B-trit-uniform-d2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Overview

Entrit/Qwen2.5-7B-trit-uniform-d2 is a quantized version of the Qwen/Qwen2.5-7B large language model, developed by Entrit Systems. This model implements a balanced ternary post-training quantization (PTQ) scheme, achieving a high compression ratio with 3.47 bits per weight. The quantization process, detailed in the paper "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026), uses a depth of d=2, resulting in 9 levels per weight.

Key Quantization Details

Source Model: Qwen/Qwen2.5-7B
Quantization Method: Uniform PTQ with a balanced ternary codec (tritllm v2).
Bits per Weight: 3.47, representing the information content of the quantized matrices.
Quantized Layers: All 2D linear matrices are quantized, while lm_head, token embeddings, and *_norm layers remain in FP16 for compatibility and performance.
Group Size: 16, with a 27-entry log-spaced scale codebook.

Usage and Performance

While the on-disk size remains similar to the FP16 source due to dequantization for standard transformers compatibility, the true benefit of this model lies in its optimized format for hardware capable of directly consuming packed trit data. This makes it particularly suitable for scenarios requiring highly efficient inference with reduced memory footprint at the hardware level. The model can be loaded using standard transformers library functions, with weights dequantized to FP16 during loading.

Overview

Overview

Key Quantization Details

Usage and Performance

Full Model Card (README)