Entrit/Qwen2.5-1.5B-trit-uniform-d4

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 4, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Entrit/Qwen2.5-1.5B-trit-uniform-d4 is a 1.5 billion parameter Qwen2.5 model from Entrit, featuring balanced ternary post-training quantization. This model uses a depth-4 quantization scheme, resulting in 6.64 bits per weight, significantly reducing the information content of the model's weights. It is optimized for efficient inference on hardware capable of consuming packed trit formats, making it suitable for resource-constrained environments.

Loading preview...

Overview

Entrit/Qwen2.5-1.5B-trit-uniform-d4 is a quantized version of the Qwen/Qwen2.5-1.5B large language model, developed by Entrit. This model implements a balanced ternary post-training quantization (PTQ) method, as described in "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Key Quantization Details

  • Source Model: Qwen/Qwen2.5-1.5B
  • Quantization Depth: d=4, yielding 81 levels per weight.
  • Bits per Weight: Achieves an effective 6.64 bits per weight, indicating substantial compression of the model's information content.
  • Method: Uniform PTQ applied to all 2D linear matrices.
  • Exclusions: lm_head, token embeddings, and all *_norm layers are kept in FP16 for compatibility and performance.
  • Codec: Utilizes the tritllm v2 codec, available at Entrit/tritllm-codec.

Performance and Use Cases

While the weights are dequantized to FP16 for standard transformers compatibility (maintaining the same on-disk size as the FP16 source), the true benefit of this model lies in its reduced information content. This makes it particularly well-suited for inference on specialized hardware that can directly process the packed trit format, such as systems leveraging the Entrit/tritllm-kernel. Developers can load this model using standard Hugging Face transformers library calls.