Name: Entrit/Llama-3.1-8B-trit-uniform-d1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Entrit

Model Overview

Entrit/Llama-3.1-8B-trit-uniform-d1 is a research artifact developed by Entrit, applying balanced ternary post-training quantization (PTQ) to Meta's Llama-3.1-8B. This 8 billion parameter model is quantized at a depth of d=1, meaning each weight uses 3 levels, achieving an information content of 1.88 bits per weight. The quantization method is uniform PTQ, and it applies to all 2D linear matrices, while lm_head, token embeddings, and *_norm layers remain in FP16.

Key Characteristics

Base Model: Derived from meta-llama/Llama-3.1-8B.
Quantization: Balanced ternary PTQ (3 levels per weight) resulting in 1.88 bits per weight.
Codec: Utilizes the tritllm v2 codec, detailed in the paper "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026).
Compatibility: Weights are dequantized to FP16 for compatibility with stock transformers libraries, maintaining the on-disk size of the FP16 source.
Efficiency Focus: The 1.88-bpw figure highlights its potential for efficient inference on specialized hardware designed to process packed trit formats directly.

Use Cases

Research and Development: Ideal for exploring and experimenting with highly quantized LLMs and their performance characteristics.
Hardware Optimization: Suitable for developers working on custom hardware or inference engines that can leverage balanced ternary representations.
Resource-Constrained Deployment: Offers a path towards more memory-efficient LLM deployment, particularly when paired with compatible hardware.

Overview

Model Overview

Key Characteristics

Use Cases

Full Model Card (README)