Name: Entrit/Qwen2.5-0.5B-trit-uniform-d4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Overview

Entrit/Qwen2.5-0.5B-trit-uniform-d4 is a quantized version of the Qwen/Qwen2.5-0.5B model, developed by Entrit. This model utilizes balanced ternary post-training quantization (PTQ) with a depth of 4, achieving an information content of 6.64 bits per weight. This quantization method is based on research presented in "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Key Quantization Details

Source Model: Qwen/Qwen2.5-0.5B
Quantization Method: Uniform PTQ with a depth of 4 (81 levels per weight).
Bits per Weight: 6.64, indicating significant compression compared to standard FP16.
Quantized Layers: All 2D linear matrices are quantized.
FP16 Layers: lm_head, token embeddings, and all *_norm layers remain in FP16 for compatibility and performance.
Codec: Uses tritllm v2 for the quantization process.

Usage and Performance Considerations

While the model's on-disk size is similar to its FP16 source due to dequantization for stock transformers compatibility, its true efficiency is realized when used with hardware and kernels designed for packed trit formats (e.g., Entrit/tritllm-kernel). This makes it particularly suitable for scenarios where reduced memory footprint and faster inference with specialized hardware are critical.

Overview

Overview

Key Quantization Details

Usage and Performance Considerations

Full Model Card (README)