meta-llama/Llama-3.2-3B-Instruct

Warm
Public
3.2B
BF16
32768
License: llama3.2
Hugging Face
Gated
Overview

Overview

meta-llama/Llama-3.2-3B-Instruct is a 3.21 billion parameter instruction-tuned model from Meta's Llama 3.2 family, optimized for multilingual dialogue and agentic applications. It utilizes an auto-regressive transformer architecture, enhanced with Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) to align with human preferences for helpfulness and safety. The model supports a 32768 token context length and was trained on up to 9 trillion tokens of publicly available online data, with a knowledge cutoff of December 2023.

Key Capabilities

  • Multilingual Performance: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, with training on a broader set of languages.
  • Optimized for Dialogue: Specifically designed for assistant-like chat, agentic retrieval, and summarization tasks.
  • Quantization Options: Available in BF16, SpinQuant, and QLoRA versions, offering significant improvements in inference speed (up to 2.6x decode speed) and reduced memory footprint (up to 60.3% smaller model size) for constrained environments like mobile devices.
  • Robust Safety Measures: Developed with a three-pronged strategy for trust and safety, including developer enablement, protection against adversarial users, and community misuse prevention.

Good For

  • Commercial and Research Use: Suitable for a wide range of applications requiring generative AI.
  • Agentic Applications: Ideal for knowledge retrieval, summarization, mobile AI-powered writing assistants, and query/prompt rewriting.
  • Resource-Constrained Environments: Quantized versions (SpinQuant, QLoRA) are specifically designed for on-device use-cases with limited compute resources, such as mobile devices.