Name: Entrit/Qwen2.5-3B-trit-uniform-d3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Model Overview

Entrit/Qwen2.5-3B-trit-uniform-d3 is a quantized version of the Qwen/Qwen2.5-3B large language model, developed by Entrit Systems. This model implements a balanced ternary post-training quantization (PTQ) scheme, as detailed in the paper "Balanced Ternary Post-Training Quantization for Large Language Models" by Eric Stentzel (2026).

Key Quantization Details

Source Model: Qwen/Qwen2.5-3B
Quantization Method: Uniform PTQ with a depth of d=3, yielding 27 levels per weight.
Bits per Weight: Achieves an information content of 5.05 bits per weight.
Codec: Utilizes tritllm v2 for quantization, with the source available at Entrit/tritllm-codec.
Quantized Layers: All 2D linear matrices are quantized.
FP16 Layers: lm_head, token embeddings, and all *_norm layers are kept in FP16 precision.

Performance and Use Cases

While the on-disk size of this model is equivalent to the FP16 source due to dequantization for standard transformers compatibility, its 5.05-bpw figure is crucial for inference on specialized hardware that can directly process the packed trit format. This makes the model particularly relevant for scenarios requiring reduced memory footprint and potentially faster inference when deployed with compatible hardware and kernels (e.g., Entrit/tritllm-kernel). It is ideal for developers exploring efficient deployment of LLMs in resource-constrained environments or those interested in advanced quantization techniques.

Overview

Model Overview

Key Quantization Details

Performance and Use Cases

Full Model Card (README)