1024m/Llama-3.2-3B-Base
Llama 3.2-3B-Base is a 3.21 billion parameter multilingual large language model developed by Meta, built on an optimized transformer architecture. This base model is part of the Llama 3.2 collection, pretrained on up to 9 trillion tokens of publicly available online data with a December 2023 cutoff. It is designed for commercial and research use, supporting multilingual text and code generation with a notable context length of 128k tokens, and serves as a foundation for various natural language generation tasks.
Loading preview...
Llama 3.2-3B-Base: Multilingual Foundation Model
Meta's Llama 3.2-3B-Base is a 3.21 billion parameter, multilingual large language model utilizing an optimized transformer architecture with Grouped-Query Attention (GQA) for improved inference. Pretrained on up to 9 trillion tokens of diverse public data, including knowledge distillation from larger Llama 3.1 models, it supports a substantial 128k token context length. The model is designed for commercial and research applications, offering capabilities in multilingual text and code generation.
Key Capabilities
- Multilingual Support: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, with training on a broader collection of languages.
- Optimized Architecture: Features an optimized transformer architecture with GQA for efficient scaling.
- Extensive Pretraining: Trained on a vast dataset of up to 9 trillion tokens, with a knowledge cutoff of December 2023.
- Long Context: Supports a context length of 128k tokens, enabling processing of longer inputs.
Good For
- Foundation for Fine-tuning: Suitable for adaptation to a variety of natural language generation tasks.
- Research and Commercial Use: Intended for broad application in both academic and enterprise settings.
- Constrained Environments: The 3B size is expected to be deployed in highly constrained environments, such as mobile devices.