Overview
PASI1028/Llama-3.2-3B-Instruct is a 3.21 billion parameter instruction-tuned model from Meta's Llama 3.2 family, designed for multilingual text-in/text-out generation. It utilizes an optimized transformer architecture with Grouped-Query Attention (GQA) and has a context length of 32,768 tokens. The model was trained on up to 9 trillion tokens of publicly available data, incorporating knowledge distillation from larger Llama 3.1 models and aligned using Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO).
Key Capabilities
- Multilingual Dialogue: Optimized for assistant-like chat and agentic applications in multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
- Agentic Tasks: Strong performance in knowledge retrieval and summarization.
- Quantization Support: Features various quantization schemes (SpinQuant, QLoRA) for efficient deployment in constrained environments, such as mobile devices, significantly reducing model size and improving inference speed.
- Robust Safety: Developed with a comprehensive safety strategy, including extensive fine-tuning, adversarial evaluations, and red teaming to mitigate critical risks like CBRNE, child safety, and cyber attacks.
Intended Use Cases
This model is suitable for commercial and research applications requiring efficient, multilingual conversational AI. It is particularly well-suited for:
- Assistant-like chat applications.
- Agentic systems for knowledge retrieval and summarization.
- Mobile AI-powered writing assistants.
- Query and prompt rewriting.
- On-device use cases with limited compute resources, especially with its quantized versions.