Name: Entrit/Qwen2.5-3B-trit-uniform-d1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Model Overview

Entrit/Qwen2.5-3B-trit-uniform-d1 is a quantized version of the Qwen/Qwen2.5-3B large language model, developed by Entrit. This model employs a novel balanced ternary post-training quantization (PTQ) method, as described in "Balanced Ternary Post-Training Quantization for Large Language Models" by Stentzel (2026).

Key Quantization Details

Source Model: Qwen/Qwen2.5-3B
Quantization Depth: d=1 (3 levels per weight)
Bits per Weight (BPW): 1.88 BPW, indicating a substantial reduction in information content compared to standard FP16 models.
Method: Uniform PTQ applied to all 2D linear matrices.
Codec: Utilizes tritllm v2 for quantization, with the source available at Entrit/tritllm-codec.

Performance and Usage

While the model's weights are dequantized to FP16 for compatibility with standard transformers libraries, its true efficiency benefit lies in its reduced information content. This makes it particularly advantageous for inference on hardware specifically designed to process the packed trit format directly, leveraging kernels like those found in Entrit/tritllm-kernel.

Good for

Resource-constrained inference: Ideal for deployments where memory footprint and computational efficiency are critical, provided compatible hardware is used.
Research into quantization techniques: Offers a practical example of balanced ternary quantization for LLMs.
Exploring novel hardware acceleration: Suitable for use cases targeting specialized hardware that can natively process ternary weights.

Overview

Model Overview

Key Quantization Details

Performance and Usage

Good for

Full Model Card (README)