Name: Entrit/Qwen2.5-7B-trit-uniform-d1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Entrit/Qwen2.5-7B-trit-uniform-d1: Balanced Ternary Quantization

This model is a 7.6 billion parameter variant of the Qwen2.5-7B architecture, developed by Entrit Systems. It features a balanced ternary post-training quantization (PTQ) at a depth of 1, meaning each weight is represented by 3 levels, resulting in an information content of 1.88 bits per weight. This quantization is based on research presented in "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Key Quantization Details

Source Model: Qwen/Qwen2.5-7B
Quantization Depth: d=1 (3 levels per weight)
Bits per Weight: 1.88
Quantization Method: Uniform PTQ, applied to all 2D linear matrices.
Exclusions: lm_head, token embeddings, and all *_norm layers remain in FP16.
Codec: Utilizes tritllm v2 for the quantization process.

Performance and Compatibility

While the model's on-disk size is equivalent to its FP16 source due to dequantization for transformers compatibility, its core innovation lies in its 1.88-bpw information content. This makes it ideal for inference on specialized hardware that can directly process packed trit formats, offering potential benefits in memory and computational efficiency. The model can be loaded and used with standard transformers library functions, with weights dequantized to FP16 during runtime.

Overview

Entrit/Qwen2.5-7B-trit-uniform-d1: Balanced Ternary Quantization

Key Quantization Details

Performance and Compatibility

Full Model Card (README)