Entrit/Llama-3.1-8B-trit-uniform-d2

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 4, 2026License:llama3.1Architecture:Transformer Cold

Entrit/Llama-3.1-8B-trit-uniform-d2 is an 8 billion parameter Llama 3.1 model from Entrit, featuring balanced ternary post-training quantization at depth d=2, resulting in 3.47 bits per weight. This research artifact is optimized for reduced information content, making it suitable for exploring efficient inference on specialized hardware that can directly consume its packed trit format. It maintains compatibility with standard `transformers` by dequantizing to FP16 for general use.

Loading preview...

Model Overview

Entrit/Llama-3.1-8B-trit-uniform-d2 is a research artifact developed by Entrit, based on the meta-llama/Llama-3.1-8B model. Its primary innovation lies in its balanced ternary post-training quantization (PTQ), applying a depth of d=2 (9 levels per weight) to achieve an information content of 3.47 bits per weight. This quantization method is detailed in the upcoming paper "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).

Key Quantization Details

  • Source Model: meta-llama/Llama-3.1-8B
  • Quantization Method: Uniform PTQ with balanced ternary quantization (d=2, 9 levels)
  • Bits per Weight: 3.47
  • Quantized Layers: All 2D linear matrices
  • FP16 Kept: lm_head, token embeddings, and all *_norm layers
  • Codec: Produced using tritllm v2 (Entrit/tritllm-codec)

Usage and Compatibility

While the model's core is quantized, it is designed for stock transformers compatibility by dequantizing weights to FP16 on load. This means its on-disk size is similar to the original FP16 model. The 3.47-bpw figure is most relevant for inference on hardware capable of directly processing the packed trit format, which can be explored with the Entrit/tritllm-kernel.

Licensing

This model is a research artifact and is subject to the Llama 3.1 Community License Agreement and Meta's Acceptable Use Policy. Commercial use is restricted by these terms.