Name: Entrit/Qwen2.5-7B-trit-uniform-d3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Overview

Entrit/Qwen2.5-7B-trit-uniform-d3 is a quantized version of the Qwen/Qwen2.5-7B model, developed by Entrit Systems. It employs a novel balanced ternary post-training quantization (PTQ) method, as detailed in the paper "Balanced Ternary Post-Training Quantization for Large Language Models" by Eric Stentzel (2026).

Key Quantization Details

Source Model: Qwen/Qwen2.5-7B
Quantization Method: Uniform PTQ with a depth of d=3, yielding 27 levels per weight.
Efficiency: Achieves an information content of 5.05 bits per weight, significantly reducing the model's effective size.
Compatibility: While the on-disk size remains similar to the FP16 source due to dequantization for stock transformers compatibility, its true efficiency is realized with hardware that directly processes the packed trit format.
Quantized Layers: All 2D linear matrices are quantized, while lm_head, token embeddings, and all *_norm layers are kept in FP16 for stability.

Use Cases

This model is particularly suitable for applications requiring reduced memory footprint and faster inference when deployed on specialized hardware capable of handling balanced ternary formats. It provides a compact representation of the powerful Qwen2.5-7B model, making it ideal for edge devices or environments with strict resource constraints.

Overview

Overview

Key Quantization Details

Use Cases

Full Model Card (README)