Name: Rev124/llama-3-pruned API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Rev124

Model Overview

Rev124/llama-3-pruned is a 3.2 billion parameter model from Meta's Llama 3.2 collection, an instruction-tuned, multilingual large language model. It is built on an optimized transformer architecture and uses supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) for alignment. The model was pretrained on up to 9 trillion tokens of publicly available online data, with a knowledge cutoff of December 2023. A key aspect of its development involved knowledge distillation, where logits from larger Llama 3.1 8B and 70B models were used as token-level targets during pretraining, especially after pruning, to recover performance.

Key Capabilities & Features

Multilingual Dialogue: Optimized for multilingual chat and agentic applications, including retrieval and summarization.
Quantization Techniques: Features advanced quantization schemes like 4-bit groupwise for weights and 8-bit dynamic for activations, along with Quantization-Aware Training (QAT) with LoRA and SpinQuant, significantly reducing model size and improving inference speed on constrained environments like mobile devices.
Performance: Demonstrates strong performance across various benchmarks, including MMLU, AGIEval, and ARC-Challenge, with instruction-tuned versions showing improved scores in general reasoning, math, and instruction following.
Long Context: Supports a context length of up to 32768 tokens, enabling processing of extensive inputs.

Intended Use Cases

Assistant-like Chatbots: Ideal for conversational AI applications requiring multilingual support.
Agentic Applications: Well-suited for tasks involving knowledge retrieval, summarization, and query/prompt rewriting.
On-Device Deployment: Quantized versions are specifically designed for deployment in environments with limited compute resources, such as mobile AI-powered writing assistants.

Noteworthy Aspects

Responsible AI: Meta emphasizes a three-pronged strategy for trust and safety, including developer enablement, protection against adversarial users, and community safeguards. The model is not designed for isolated deployment and requires system-level safeguards.
Energy Efficiency: Training involved 916k GPU hours, with Meta achieving net-zero greenhouse gas emissions for training due to 100% renewable energy matching.
License: Governed by the Llama 3.2 Community License, a custom commercial license.

Overview

Model Overview

Key Capabilities & Features

Intended Use Cases

Noteworthy Aspects

Full Model Card (README)