meta-llama/Llama-3.2-3B

Warm
Public
3.2B
BF16
32768
1
Sep 18, 2024
License: llama3.2
Hugging Face
Gated
Overview

Llama 3.2-3B: Multilingual LLM for Agentic Applications

Llama 3.2-3B is a 3.21 billion parameter instruction-tuned model from Meta's Llama 3.2 collection, designed for multilingual text-in/text-out generative tasks. It leverages an optimized transformer architecture and is trained on up to 9 trillion tokens of publicly available online data with a knowledge cutoff of December 2023. The model incorporates logits from larger Llama 3.1 models during pretraining and uses Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO) for alignment.

Key Capabilities

  • Multilingual Dialogue: Optimized for assistant-like chat and agentic applications in multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
  • Agentic Tasks: Excels in knowledge retrieval, summarization, mobile AI-powered writing assistants, and query/prompt rewriting.
  • Quantization Options: Supports various quantization schemes (4-bit groupwise for weights, 8-bit dynamic for activations) including SpinQuant and QLoRA, significantly reducing model size and improving inference speed on ARM CPUs.
  • Long Context: Features a 32,768 token context length, enabling processing of extensive inputs.

Good For

  • Mobile and Edge Devices: Its smaller size and optimized quantization make it suitable for deployment in constrained environments with limited compute resources.
  • Multilingual Applications: Ideal for developing applications requiring robust performance across several supported languages.
  • Dialogue Systems: Strong performance in instruction following and dialogue benchmarks makes it a solid choice for conversational AI and agentic workflows.