Model Overview
Fedir-Ilina/meta-llamaLlama-3.2-1B is a 1.23 billion parameter model from Meta's Llama 3.2 collection, designed for multilingual text-in/text-out generative tasks. It is built on an optimized transformer architecture and has been instruction-tuned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The model supports a substantial context length of 32768 tokens and was trained on up to 9 trillion tokens of publicly available online data, with a knowledge cutoff of December 2023.
Key Capabilities
- Multilingual Dialogue: Optimized for multilingual chat and agentic applications, including knowledge retrieval and summarization.
- Quantization Support: Features various quantization schemes (SpinQuant, QLoRA) for efficient deployment in constrained environments like mobile devices, significantly improving decode speed and reducing memory footprint.
- Performance: Outperforms many open-source and closed chat models on industry benchmarks, particularly in multilingual contexts across languages like Portuguese, Spanish, Italian, German, French, Hindi, and Thai.
- Responsible AI: Developed with a strong focus on safety, incorporating extensive fine-tuning, scaled evaluations, and red teaming to mitigate critical risks such as CBRNE, child safety, and cyber attacks.
Good For
- Multilingual Applications: Ideal for developing applications requiring robust performance across multiple languages.
- Resource-Constrained Environments: The quantized versions are well-suited for on-device use cases with limited compute resources, offering significant improvements in inference time and model size.
- Agentic Systems: Designed for integration into agentic applications for tasks like information retrieval and summarization.
- Research and Commercial Use: Intended for both academic research and commercial deployments, with a focus on responsible and safe AI development.