Name: Entrit/Qwen2.5-0.5B-trit-uniform-d2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Model Overview

Entrit/Qwen2.5-0.5B-trit-uniform-d2 is a 0.5 billion parameter model derived from the Qwen2.5-0.5B base model. Its primary distinction lies in its balanced ternary post-training quantization (PTQ), developed by Eric Stentzel at Entrit Systems. This quantization scheme uses a depth of 2, meaning 9 levels per weight, which translates to an information content of 3.47 bits per weight.

Key Quantization Details

Source Model: Qwen/Qwen2.5-0.5B
Quantization Method: Uniform PTQ with a group size of 16.
Bits per Weight: 3.47, indicating a highly compressed representation.
Quantized Layers: All 2D linear matrices are quantized.
FP16 Layers: lm_head, token embeddings, and all *_norm layers remain in FP16 for precision.
Codec: Utilizes the tritllm v2 codec, available via Entrit/tritllm-codec.

Performance and Use Cases

While the model's weights are dequantized to FP16 for compatibility with stock transformers (resulting in the same on-disk size as the FP16 source), its true efficiency benefit is realized when deployed on hardware capable of consuming the packed trit format directly. This makes it particularly suitable for:

Resource-constrained environments: Where memory footprint and computational efficiency are critical.
Specialized hardware: Designed to leverage ternary or low-bit quantization for faster inference.

This model represents an exploration into highly efficient model deployment through advanced quantization techniques, as detailed in the forthcoming paper "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Overview

Model Overview

Key Quantization Details

Performance and Use Cases

Full Model Card (README)