Name: Entrit/Qwen2.5-0.5B-trit-uniform-d3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Model Overview

Entrit/Qwen2.5-0.5B-trit-uniform-d3 is a quantized version of the Qwen/Qwen2.5-0.5B language model, developed by Entrit Systems. This model utilizes balanced ternary post-training quantization (PTQ), specifically at a depth of d=3, which translates to 27 levels per weight and an information content of 5.05 bits per weight. The quantization method is uniform PTQ, applied to all 2D linear matrices within the model.

Key Quantization Details

Source Model: Qwen/Qwen2.5-0.5B
Quantization Depth: d=3 (27 levels)
Bits per Weight: 5.05 bpw
Group Size: 16
Scale Codebook: 27-entry log-spaced (scale_depth=3)
Quantized Layers: All 2D linear matrices
FP16 Kept: lm_head, token embeddings, and all *_norm layers remain in FP16.

Performance and Use Cases

While the on-disk size remains similar to the FP16 source due to dequantization for stock-transformers compatibility, the 5.05-bpw figure highlights the model's reduced information content. This makes it particularly suitable for inference on specialized hardware that can directly process the packed trit format, as supported by the Entrit/tritllm-kernel. The quantization process is based on the codec described in the paper "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

This model is ideal for applications requiring highly efficient and memory-optimized LLM inference, especially where custom hardware or specialized kernels can leverage the balanced ternary quantization for faster computation.

Overview

Model Overview

Key Quantization Details

Performance and Use Cases

Full Model Card (README)