Name: Entrit/Qwen2.5-0.5B-trit-uniform-d1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Model Overview

Entrit/Qwen2.5-0.5B-trit-uniform-d1 is a quantized version of the Qwen/Qwen2.5-0.5B language model, developed by Entrit. This model employs balanced ternary post-training quantization at a depth of d=1, resulting in an efficient 1.88 bits per weight. The quantization process is based on the codec described in "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Key Quantization Details

Source Model: Qwen/Qwen2.5-0.5B
Quantization Method: Uniform Post-Training Quantization (PTQ)
Bits per Weight: 1.88 bpw (3 levels per weight)
Group Size: 16
Quantized Layers: All 2D linear matrices
FP16 Kept: lm_head, token embeddings, and all *_norm layers remain in FP16 for compatibility and performance.

Performance and Compatibility

While the on-disk size is similar to the FP16 source due to dequantization for stock transformers compatibility, the 1.88 bpw figure represents the true information content. This makes the model particularly efficient for inference on specialized hardware that can directly consume the packed trit format, leveraging the Entrit/tritllm-kernel.

Use Cases

This model is ideal for scenarios where memory footprint reduction and accelerated inference are critical, especially on edge devices or systems with limited resources, provided the inference environment supports or can be adapted to handle the packed trit format efficiently.

Overview

Model Overview

Key Quantization Details

Performance and Compatibility

Use Cases

Full Model Card (README)