ParetoQaft/3B-base: Multilingual Llama 3.2 Model
ParetoQaft/3B-base is a 3.21 billion parameter model from Meta's Llama 3.2 collection, designed for multilingual text-in/text-out generative tasks. It utilizes an optimized transformer architecture with Grouped-Query Attention (GQA) and has been pretrained on up to 9 trillion tokens of publicly available online data, with a knowledge cutoff of December 2023.
Key Capabilities & Features
- Multilingual Support: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, with training on a broader range of languages.
- Optimized for Dialogue: Instruction-tuned versions are specifically optimized for multilingual dialogue, agentic retrieval, and summarization.
- Quantization Schemes: Features advanced quantization methods like SpinQuant and QLoRA, significantly improving inference speed (up to 2.6x decode speed) and reducing model size and memory footprint for constrained environments like mobile devices.
- Performance: Demonstrates strong performance across various benchmarks, including MMLU, GSM8K, and MATH, often outperforming other models in its size class, particularly in multilingual contexts.
- Safety Alignment: Developed with a focus on responsible AI, incorporating supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) for helpfulness and safety.
Good For
- Multilingual Chatbots & Assistants: Excels in dialogue-based applications requiring support for multiple languages.
- Agentic Applications: Ideal for tasks like knowledge retrieval and summarization.
- Resource-Constrained Environments: Quantized versions are suitable for on-device deployment with limited compute resources.
- Research & Commercial Use: Intended for both commercial and research applications, offering a robust foundation for natural language generation tasks.