alpindale/Llama-3.2-3B-Instruct

Warm
Public
3.2B
BF16
32768
1
Sep 25, 2024
License: llama3.2
Hugging Face
Overview

Model Overview

alpindale/Llama-3.2-3B-Instruct is a 3.21 billion parameter instruction-tuned model from Meta's Llama 3.2 family, designed for multilingual text-in/text-out generation. It utilizes an optimized transformer architecture with Grouped-Query Attention (GQA) and supports a substantial context length of 32768 tokens. The model was trained on a new mix of publicly available online data, up to 9 trillion tokens, with a knowledge cutoff of December 2023. It incorporates knowledge distillation from larger Llama 3.1 models and undergoes Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO) for alignment.

Key Capabilities

  • Multilingual Dialogue: Optimized for multilingual chat, agentic retrieval, and summarization tasks, with official support for English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
  • Instruction Following: Achieves strong performance on instruction-following benchmarks (e.g., 77.4 on IFEval for the 3B model).
  • Mathematical Reasoning: Demonstrates solid capabilities in math tasks, scoring 77.7 on GSM8K (CoT) and 47.3 on MATH (CoT).
  • Long Context Handling: Features a 32K context window, showing good recall on Needle in Haystack and performance on InfiniteBench long context tasks.
  • Resource Efficiency: The 3B size is suitable for deployment in constrained environments, such as mobile devices, offering a balance of capability and efficiency.

Intended Use Cases

This model is intended for commercial and research use, particularly for assistant-like chat applications, agentic systems (like knowledge retrieval and summarization), mobile AI-powered writing assistants, and query/prompt rewriting. Developers are encouraged to implement additional safety guardrails, especially for constrained environments, and can leverage Meta's provided safeguards like Llama Guard.