Rev124/llama-3-pruned
Rev124/llama-3-pruned is a 3.2 billion parameter multilingual large language model developed by Meta, based on the Llama 3.2 architecture. This instruction-tuned model is optimized for multilingual dialogue, agentic retrieval, and summarization tasks, outperforming many open-source and closed chat models on industry benchmarks. It supports a context length of up to 32768 tokens and is designed for commercial and research use in multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Loading preview...
Model Overview
Rev124/llama-3-pruned is a 3.2 billion parameter model from Meta's Llama 3.2 collection, an instruction-tuned, multilingual large language model. It is built on an optimized transformer architecture and uses supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) for alignment. The model was pretrained on up to 9 trillion tokens of publicly available online data, with a knowledge cutoff of December 2023. A key aspect of its development involved knowledge distillation, where logits from larger Llama 3.1 8B and 70B models were used as token-level targets during pretraining, especially after pruning, to recover performance.
Key Capabilities & Features
- Multilingual Dialogue: Optimized for multilingual chat and agentic applications, including retrieval and summarization.
- Quantization Techniques: Features advanced quantization schemes like 4-bit groupwise for weights and 8-bit dynamic for activations, along with Quantization-Aware Training (QAT) with LoRA and SpinQuant, significantly reducing model size and improving inference speed on constrained environments like mobile devices.
- Performance: Demonstrates strong performance across various benchmarks, including MMLU, AGIEval, and ARC-Challenge, with instruction-tuned versions showing improved scores in general reasoning, math, and instruction following.
- Long Context: Supports a context length of up to 32768 tokens, enabling processing of extensive inputs.
Intended Use Cases
- Assistant-like Chatbots: Ideal for conversational AI applications requiring multilingual support.
- Agentic Applications: Well-suited for tasks involving knowledge retrieval, summarization, and query/prompt rewriting.
- On-Device Deployment: Quantized versions are specifically designed for deployment in environments with limited compute resources, such as mobile AI-powered writing assistants.
Noteworthy Aspects
- Responsible AI: Meta emphasizes a three-pronged strategy for trust and safety, including developer enablement, protection against adversarial users, and community safeguards. The model is not designed for isolated deployment and requires system-level safeguards.
- Energy Efficiency: Training involved 916k GPU hours, with Meta achieving net-zero greenhouse gas emissions for training due to 100% renewable energy matching.
- License: Governed by the Llama 3.2 Community License, a custom commercial license.