Name: Entrit/Qwen2.5-7B-trit-uniform-d4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Entrit/Qwen2.5-7B-trit-uniform-d4: Quantized Qwen2.5-7B

This model is a balanced ternary post-training quantized version of the 7.6 billion parameter Qwen/Qwen2.5-7B model. Developed by Entrit Systems, it employs a novel quantization codec described in the paper "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Key Quantization Details

Quantization Method: Uniform Post-Training Quantization (PTQ).
Depth: d=4, meaning 81 levels per weight.
Bits per Weight: Achieves an information content of 6.64 bits per weight, significantly reducing the effective size of the model's matrices.
Source Model: Based on the robust Qwen2.5-7B architecture.
Codec: Utilizes tritllm v2 for the quantization process.
Layers Quantized: All 2D linear matrices are quantized, while lm_head, token embeddings, and all *_norm layers remain in FP16 for stability.

Performance and Compatibility

While the on-disk size is equivalent to the FP16 source due to dequantization for transformers compatibility, the 6.64-bpw figure represents the true information content. This makes the model particularly suitable for inference on specialized hardware that can directly process the packed trit format, leveraging the reduced bit-width for improved efficiency. The associated codec and kernel are available via Entrit/tritllm-codec and Entrit/tritllm-kernel respectively.

Overview

Entrit/Qwen2.5-7B-trit-uniform-d4: Quantized Qwen2.5-7B

Key Quantization Details

Performance and Compatibility

Full Model Card (README)