Llama 3.2-3B-Base: Multilingual Foundation Model

Meta's Llama 3.2-3B-Base is a 3.21 billion parameter, multilingual large language model utilizing an optimized transformer architecture with Grouped-Query Attention (GQA) for improved inference. Pretrained on up to 9 trillion tokens of diverse public data, including knowledge distillation from larger Llama 3.1 models, it supports a substantial 128k token context length. The model is designed for commercial and research applications, offering capabilities in multilingual text and code generation.

Key Capabilities

Multilingual Support: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, with training on a broader collection of languages.
Optimized Architecture: Features an optimized transformer architecture with GQA for efficient scaling.
Extensive Pretraining: Trained on a vast dataset of up to 9 trillion tokens, with a knowledge cutoff of December 2023.
Long Context: Supports a context length of 128k tokens, enabling processing of longer inputs.

Good For

Foundation for Fine-tuning: Suitable for adaptation to a variety of natural language generation tasks.
Research and Commercial Use: Intended for broad application in both academic and enterprise settings.
Constrained Environments: The 3B size is expected to be deployed in highly constrained environments, such as mobile devices.

Overview

Llama 3.2-3B-Base: Multilingual Foundation Model

Key Capabilities

Good For

Full Model Card (README)