Name: Entrit/Llama-3.1-8B-trit-uniform-d3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Entrit/Llama-3.1-8B-trit-uniform-d3: A Quantized Llama 3.1 Model

This model, developed by Entrit, is a balanced ternary post-training quantized version of Meta's Llama-3.1-8B. It utilizes a unique quantization scheme to reduce the model's information content while maintaining compatibility with standard transformers libraries.

Key Quantization Details

Source Model: meta-llama/Llama-3.1-8B
Quantization Method: Uniform Post-Training Quantization (PTQ) with balanced ternary levels.
Depth (d): 3, meaning 27 levels per weight.
Bits per Weight: Achieves an efficient 5.05 bits per weight, indicating significant compression of the model's parameters.
Quantized Layers: All 2D linear matrices are quantized, while lm_head, token embeddings, and all *_norm layers remain in FP16 for stability.
Codec: Produced using the tritllm v2 codec, detailed in the research paper "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Usage and Compatibility

While the on-disk size remains similar to the FP16 source due to dequantization for transformers compatibility, the 5.05-bpw figure highlights the true information content. This is crucial for hardware designed to directly process the packed trit format, as explored in Entrit/tritllm-kernel.

Licensing and Attribution

As a research artifact, this model adheres to the Llama 3.1 Community License Agreement and Meta's Acceptable Use Policy. Users are required to display "Built with Llama" attribution for any redistributed or publicly demonstrated derivatives.

Overview

Entrit/Llama-3.1-8B-trit-uniform-d3: A Quantized Llama 3.1 Model

Key Quantization Details

Usage and Compatibility

Licensing and Attribution

Full Model Card (README)