Name: Entrit/Qwen2.5-1.5B-trit-uniform-d4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Overview

Entrit/Qwen2.5-1.5B-trit-uniform-d4 is a quantized version of the Qwen/Qwen2.5-1.5B large language model, developed by Entrit. This model implements a balanced ternary post-training quantization (PTQ) method, as described in "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Key Quantization Details

Source Model: Qwen/Qwen2.5-1.5B
Quantization Depth: d=4, yielding 81 levels per weight.
Bits per Weight: Achieves an effective 6.64 bits per weight, indicating substantial compression of the model's information content.
Method: Uniform PTQ applied to all 2D linear matrices.
Exclusions: lm_head, token embeddings, and all *_norm layers are kept in FP16 for compatibility and performance.
Codec: Utilizes the tritllm v2 codec, available at Entrit/tritllm-codec.

Performance and Use Cases

While the weights are dequantized to FP16 for standard transformers compatibility (maintaining the same on-disk size as the FP16 source), the true benefit of this model lies in its reduced information content. This makes it particularly well-suited for inference on specialized hardware that can directly process the packed trit format, such as systems leveraging the Entrit/tritllm-kernel. Developers can load this model using standard Hugging Face transformers library calls.

Overview

Overview

Key Quantization Details

Performance and Use Cases

Full Model Card (README)