Llama 3.2-3B: Multilingual LLM for Agentic Applications

Llama 3.2-3B is a 3.21 billion parameter instruction-tuned model from Meta's Llama 3.2 collection, designed for multilingual text-in/text-out generative tasks. It leverages an optimized transformer architecture and is trained on up to 9 trillion tokens of publicly available online data with a knowledge cutoff of December 2023. The model incorporates logits from larger Llama 3.1 models during pretraining and uses Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO) for alignment.

Key Capabilities

Multilingual Dialogue: Optimized for assistant-like chat and agentic applications in multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Agentic Tasks: Excels in knowledge retrieval, summarization, mobile AI-powered writing assistants, and query/prompt rewriting.
Quantization Options: Supports various quantization schemes (4-bit groupwise for weights, 8-bit dynamic for activations) including SpinQuant and QLoRA, significantly reducing model size and improving inference speed on ARM CPUs.
Long Context: Features a 32,768 token context length, enabling processing of extensive inputs.

Good For

Mobile and Edge Devices: Its smaller size and optimized quantization make it suitable for deployment in constrained environments with limited compute resources.
Multilingual Applications: Ideal for developing applications requiring robust performance across several supported languages.
Dialogue Systems: Strong performance in instruction following and dialogue benchmarks makes it a solid choice for conversational AI and agentic workflows.