Entrit/Mistral-7B-v0.3-trit-uniform-d3

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:May 4, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Entrit/Mistral-7B-v0.3-trit-uniform-d3 is a 7 billion parameter language model, a balanced ternary quantized version of Mistral-7B-v0.3. Developed by Entrit, it uses a depth d=3 quantization scheme, achieving 5.05 bits per weight for efficient inference. This model is optimized for reduced memory footprint and faster processing on hardware supporting packed trit formats, making it suitable for resource-constrained environments. It maintains compatibility with stock Hugging Face Transformers by dequantizing to FP16 for standard use.

Loading preview...

Entrit/Mistral-7B-v0.3-trit-uniform-d3: Balanced Ternary Quantization

This model is a 7 billion parameter variant of mistralai/Mistral-7B-v0.3, featuring balanced ternary post-training quantization (PTQ) developed by Entrit. It implements a depth d=3 quantization, resulting in 27 levels per weight and an information content of 5.05 bits per weight.

Key Capabilities & Features

  • Efficient Quantization: Utilizes a uniform PTQ method with a group size of 16, quantizing all 2D linear matrices.
  • Reduced Information Content: Achieves a significant reduction in the information content of weights (5.05 bpw), beneficial for specialized hardware inference.
  • transformers Compatibility: Weights are dequantized to FP16 for seamless integration with standard Hugging Face transformers library, ensuring ease of use.
  • Selective Quantization: Key components like lm_head, token embeddings, and *_norm layers are kept in FP16 to preserve model quality.
  • Codec Availability: Produced using the tritllm-codec from Entrit, based on research by Stentzel (2026).

When to Use This Model

  • Resource-Constrained Deployment: Ideal for scenarios where memory footprint and inference speed are critical, especially on hardware designed to consume packed trit formats directly.
  • Research in Quantization: Useful for researchers exploring balanced ternary quantization and its impact on LLM performance.
  • Standard LLM Tasks: Can be used for general-purpose language generation and understanding tasks, leveraging the underlying Mistral-7B-v0.3 architecture with quantization benefits.