1024m/Llama-3.2-3B-Base

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Sep 25, 2024License:llama3.2Architecture:Transformer0.0K Warm

Llama 3.2-3B-Base is a 3.21 billion parameter multilingual large language model developed by Meta, built on an optimized transformer architecture. This base model is part of the Llama 3.2 collection, pretrained on up to 9 trillion tokens of publicly available online data with a December 2023 cutoff. It is designed for commercial and research use, supporting multilingual text and code generation with a notable context length of 128k tokens, and serves as a foundation for various natural language generation tasks.

Loading preview...

Llama 3.2-3B-Base: Multilingual Foundation Model

Meta's Llama 3.2-3B-Base is a 3.21 billion parameter, multilingual large language model utilizing an optimized transformer architecture with Grouped-Query Attention (GQA) for improved inference. Pretrained on up to 9 trillion tokens of diverse public data, including knowledge distillation from larger Llama 3.1 models, it supports a substantial 128k token context length. The model is designed for commercial and research applications, offering capabilities in multilingual text and code generation.

Key Capabilities

  • Multilingual Support: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, with training on a broader collection of languages.
  • Optimized Architecture: Features an optimized transformer architecture with GQA for efficient scaling.
  • Extensive Pretraining: Trained on a vast dataset of up to 9 trillion tokens, with a knowledge cutoff of December 2023.
  • Long Context: Supports a context length of 128k tokens, enabling processing of longer inputs.

Good For

  • Foundation for Fine-tuning: Suitable for adaptation to a variety of natural language generation tasks.
  • Research and Commercial Use: Intended for broad application in both academic and enterprise settings.
  • Constrained Environments: The 3B size is expected to be deployed in highly constrained environments, such as mobile devices.