Model Overview
HaolunLi/LLaMA-3.2-3B-SRL is a 3.2 billion parameter instruction-tuned model from Meta's Llama 3.2 family, optimized for multilingual dialogue. It utilizes an optimized transformer architecture with Grouped-Query Attention (GQA) and was trained on up to 9 trillion tokens of publicly available online data, with a knowledge cutoff of December 2023. The model incorporates knowledge distillation from larger Llama 3.1 models and undergoes supervised fine-tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO) for alignment.
Key Capabilities & Features
- Multilingual Support: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, with broader training across other languages.
- Optimized for Dialogue: Specifically designed for assistant-like chat, agentic retrieval, and summarization tasks.
- Quantization Schemes: Features advanced quantization methods like SpinQuant and QLoRA, significantly improving inference speed (up to 2.6x faster decoding) and reducing model size and memory footprint for on-device deployment.
- Robust Safety Alignment: Developed with a focus on responsible AI, incorporating safety fine-tuning, extensive red teaming, and safeguards against critical risks like CBRNE and cyber attacks.
Intended Use Cases
This model is ideal for commercial and research applications requiring efficient, multilingual text generation in constrained environments. It is particularly well-suited for:
- Mobile AI-powered writing assistants.
- Query and prompt rewriting.
- Agentic applications like knowledge retrieval and summarization.
- On-device deployments where computational resources are limited.