Model Overview

Llama 3.2-1B, developed by Meta, is a 1 billion parameter multilingual large language model (LLM) from the Llama 3.2 collection. It features an optimized transformer architecture and is instruction-tuned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The model was trained on up to 9 trillion tokens of publicly available online data, with a knowledge cutoff of December 2023.

Key Capabilities

Multilingual Support: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, with broader training data for other languages.
Dialogue Optimization: Specifically instruction-tuned for multilingual dialogue use cases.
Agentic Applications: Excels in agentic retrieval and summarization tasks.
Quantization: Available in quantized versions (SpinQuant, QLoRA) optimized for on-device use with limited compute resources, demonstrating significant improvements in inference speed and reduced memory footprint.
Long Context: Supports a context length of 32768 tokens.

Good For

Commercial and Research Use: Intended for a wide range of applications.
Assistant-like Chat: Ideal for building conversational AI assistants.
Agentic Systems: Suitable for knowledge retrieval and summarization agents.
Mobile AI: Quantized versions are designed for deployment in highly constrained environments, such as mobile devices.
Prompt Rewriting: Can be used for query and prompt rewriting tasks.