Entrit/Qwen2.5-1.5B-trit-uniform-d2

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 4, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Entrit/Qwen2.5-1.5B-trit-uniform-d2 is a 1.5 billion parameter language model based on the Qwen2.5 architecture, developed by Entrit Systems. This model features balanced ternary post-training quantization at a depth of d=2, resulting in an information content of 3.47 bits per weight. It is optimized for efficient inference on hardware capable of consuming packed trit formats, making it suitable for resource-constrained environments.

Loading preview...

Entrit/Qwen2.5-1.5B-trit-uniform-d2: A Quantized Qwen2.5 Model

This model is a 1.5 billion parameter variant of the Qwen2.5 architecture, developed by Entrit Systems. Its primary distinction lies in its balanced ternary post-training quantization (PTQ), achieved at a depth of d=2, which translates to 9 levels per weight and an information content of 3.47 bits per weight. This quantization was performed using the codec from "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Key Capabilities and Features

  • Efficient Quantization: Utilizes a uniform PTQ method with a group size of 16 and a 27-entry log-spaced scale codebook.
  • Reduced Information Content: Achieves 3.47 bits per weight, making it highly efficient for specialized hardware that can process packed trit formats directly.
  • transformers Compatibility: While the on-disk size is similar to the FP16 source due to dequantization to FP16 for standard transformers library compatibility, its core efficiency is in its quantized representation.
  • Targeted Quantization: All 2D linear matrices are quantized, while lm_head, token embeddings, and all *_norm layers are kept in FP16 to preserve critical model components.

When to Use This Model

This model is particularly well-suited for applications where memory footprint and inference speed on specialized hardware are critical. Developers looking to experiment with or deploy models leveraging balanced ternary quantization for improved efficiency will find this model valuable. It provides a quantized version of the Qwen2.5-1.5B base model, offering a balance between performance and resource optimization.