project-free-llama/Llama-3.2-1B
Llama 3.2 1B is a 1.23 billion parameter multilingual large language model developed by Meta, built on an optimized transformer architecture. It is instruction-tuned for multilingual dialogue use cases, excelling in agentic retrieval, summarization tasks, and mobile AI applications. The model supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, and is designed for commercial and research use.
Loading preview...
Llama 3.2 1B: Multilingual LLM for Dialogue and Agentic Tasks
Meta's Llama 3.2 1B is a 1.23 billion parameter multilingual large language model, part of the Llama 3.2 collection. It utilizes an optimized transformer architecture and is instruction-tuned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Key Capabilities & Features
- Multilingual Support: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, with training on a broader set of languages.
- Optimized for Dialogue: Instruction-tuned for assistant-like chat and agentic applications such as knowledge retrieval, summarization, and mobile AI-powered writing assistants.
- Quantization Schemes: Features 4-bit groupwise quantization for weights and 8-bit dynamic quantization for activations, designed for efficient inference on ARM CPU backends, particularly for constrained environments like mobile devices.
- Performance: Benchmarks show significant improvements in decode and prefill speeds, and reduced model/memory size with SpinQuant and QLoRA methods compared to BF16 baseline.
- Training Data: Pretrained on up to 9 trillion tokens of publicly available online data, with a knowledge cutoff of December 2023.
Intended Use Cases
- Commercial and Research: Suitable for a wide range of applications in multiple languages.
- Agentic Applications: Ideal for systems requiring retrieval, summarization, and query rewriting.
- Constrained Environments: The 1B model, especially with quantization, is designed for deployment on devices with limited compute resources, such as mobile phones.