Model Overview
alpindale/Llama-3.2-3B-Instruct is a 3.21 billion parameter instruction-tuned model from Meta's Llama 3.2 family, designed for multilingual text-in/text-out generation. It utilizes an optimized transformer architecture with Grouped-Query Attention (GQA) and supports a substantial context length of 32768 tokens. The model was trained on a new mix of publicly available online data, up to 9 trillion tokens, with a knowledge cutoff of December 2023. It incorporates knowledge distillation from larger Llama 3.1 models and undergoes Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO) for alignment.
Key Capabilities
- Multilingual Dialogue: Optimized for multilingual chat, agentic retrieval, and summarization tasks, with official support for English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
- Instruction Following: Achieves strong performance on instruction-following benchmarks (e.g., 77.4 on IFEval for the 3B model).
- Mathematical Reasoning: Demonstrates solid capabilities in math tasks, scoring 77.7 on GSM8K (CoT) and 47.3 on MATH (CoT).
- Long Context Handling: Features a 32K context window, showing good recall on Needle in Haystack and performance on InfiniteBench long context tasks.
- Resource Efficiency: The 3B size is suitable for deployment in constrained environments, such as mobile devices, offering a balance of capability and efficiency.
Intended Use Cases
This model is intended for commercial and research use, particularly for assistant-like chat applications, agentic systems (like knowledge retrieval and summarization), mobile AI-powered writing assistants, and query/prompt rewriting. Developers are encouraged to implement additional safety guardrails, especially for constrained environments, and can leverage Meta's provided safeguards like Llama Guard.