Entrit/Llama-3.1-8B-trit-uniform-d3

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 4, 2026License:llama3.1Architecture:Transformer Cold

Entrit/Llama-3.1-8B-trit-uniform-d3 is an 8 billion parameter Llama 3.1 model developed by Entrit, featuring balanced ternary post-training quantization. This model uses a depth of 3 (27 levels per weight) resulting in 5.05 bits per weight, significantly reducing the information content of the quantized matrices. It is a research artifact primarily focused on demonstrating efficient quantization techniques for large language models.

Loading preview...

Entrit/Llama-3.1-8B-trit-uniform-d3: A Quantized Llama 3.1 Model

This model, developed by Entrit, is a balanced ternary post-training quantized version of Meta's Llama-3.1-8B. It utilizes a unique quantization scheme to reduce the model's information content while maintaining compatibility with standard transformers libraries.

Key Quantization Details

  • Source Model: meta-llama/Llama-3.1-8B
  • Quantization Method: Uniform Post-Training Quantization (PTQ) with balanced ternary levels.
  • Depth (d): 3, meaning 27 levels per weight.
  • Bits per Weight: Achieves an efficient 5.05 bits per weight, indicating significant compression of the model's parameters.
  • Quantized Layers: All 2D linear matrices are quantized, while lm_head, token embeddings, and all *_norm layers remain in FP16 for stability.
  • Codec: Produced using the tritllm v2 codec, detailed in the research paper "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Usage and Compatibility

While the on-disk size remains similar to the FP16 source due to dequantization for transformers compatibility, the 5.05-bpw figure highlights the true information content. This is crucial for hardware designed to directly process the packed trit format, as explored in Entrit/tritllm-kernel.

Licensing and Attribution

As a research artifact, this model adheres to the Llama 3.1 Community License Agreement and Meta's Acceptable Use Policy. Users are required to display "Built with Llama" attribution for any redistributed or publicly demonstrated derivatives.