Name: Entrit/Qwen2.5-1.5B-trit-uniform-d2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Entrit/Qwen2.5-1.5B-trit-uniform-d2: A Quantized Qwen2.5 Model

This model is a 1.5 billion parameter variant of the Qwen2.5 architecture, developed by Entrit Systems. Its primary distinction lies in its balanced ternary post-training quantization (PTQ), achieved at a depth of d=2, which translates to 9 levels per weight and an information content of 3.47 bits per weight. This quantization was performed using the codec from "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Key Capabilities and Features

Efficient Quantization: Utilizes a uniform PTQ method with a group size of 16 and a 27-entry log-spaced scale codebook.
Reduced Information Content: Achieves 3.47 bits per weight, making it highly efficient for specialized hardware that can process packed trit formats directly.
transformers Compatibility: While the on-disk size is similar to the FP16 source due to dequantization to FP16 for standard transformers library compatibility, its core efficiency is in its quantized representation.
Targeted Quantization: All 2D linear matrices are quantized, while lm_head, token embeddings, and all *_norm layers are kept in FP16 to preserve critical model components.

When to Use This Model

This model is particularly well-suited for applications where memory footprint and inference speed on specialized hardware are critical. Developers looking to experiment with or deploy models leveraging balanced ternary quantization for improved efficiency will find this model valuable. It provides a quantized version of the Qwen2.5-1.5B base model, offering a balance between performance and resource optimization.

Overview

Entrit/Qwen2.5-1.5B-trit-uniform-d2: A Quantized Qwen2.5 Model

Key Capabilities and Features

When to Use This Model

Full Model Card (README)