Name: Entrit/Mistral-7B-v0.3-trit-uniform-d3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Entrit/Mistral-7B-v0.3-trit-uniform-d3: Balanced Ternary Quantization

This model is a 7 billion parameter variant of mistralai/Mistral-7B-v0.3, featuring balanced ternary post-training quantization (PTQ) developed by Entrit. It implements a depth d=3 quantization, resulting in 27 levels per weight and an information content of 5.05 bits per weight.

Key Capabilities & Features

Efficient Quantization: Utilizes a uniform PTQ method with a group size of 16, quantizing all 2D linear matrices.
Reduced Information Content: Achieves a significant reduction in the information content of weights (5.05 bpw), beneficial for specialized hardware inference.
transformers Compatibility: Weights are dequantized to FP16 for seamless integration with standard Hugging Face transformers library, ensuring ease of use.
Selective Quantization: Key components like lm_head, token embeddings, and *_norm layers are kept in FP16 to preserve model quality.
Codec Availability: Produced using the tritllm-codec from Entrit, based on research by Stentzel (2026).

When to Use This Model

Resource-Constrained Deployment: Ideal for scenarios where memory footprint and inference speed are critical, especially on hardware designed to consume packed trit formats directly.
Research in Quantization: Useful for researchers exploring balanced ternary quantization and its impact on LLM performance.
Standard LLM Tasks: Can be used for general-purpose language generation and understanding tasks, leveraging the underlying Mistral-7B-v0.3 architecture with quantization benefits.

Overview

Entrit/Mistral-7B-v0.3-trit-uniform-d3: Balanced Ternary Quantization

Key Capabilities & Features

When to Use This Model

Full Model Card (README)