ParetoQaft/3B-base
ParetoQaft/3B-base is a 3.21 billion parameter multilingual large language model from the Llama 3.2 family, developed by Meta. This auto-regressive transformer model is optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. It features an optimized transformer architecture, Grouped-Query Attention (GQA), and a 32K context length, outperforming many open-source and closed chat models on common industry benchmarks. The model supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, with a knowledge cutoff of December 2023.
Loading preview...
ParetoQaft/3B-base: Multilingual Llama 3.2 Model
ParetoQaft/3B-base is a 3.21 billion parameter model from Meta's Llama 3.2 collection, designed for multilingual text-in/text-out generative tasks. It utilizes an optimized transformer architecture with Grouped-Query Attention (GQA) and has been pretrained on up to 9 trillion tokens of publicly available online data, with a knowledge cutoff of December 2023.
Key Capabilities & Features
- Multilingual Support: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, with training on a broader range of languages.
- Optimized for Dialogue: Instruction-tuned versions are specifically optimized for multilingual dialogue, agentic retrieval, and summarization.
- Quantization Schemes: Features advanced quantization methods like SpinQuant and QLoRA, significantly improving inference speed (up to 2.6x decode speed) and reducing model size and memory footprint for constrained environments like mobile devices.
- Performance: Demonstrates strong performance across various benchmarks, including MMLU, GSM8K, and MATH, often outperforming other models in its size class, particularly in multilingual contexts.
- Safety Alignment: Developed with a focus on responsible AI, incorporating supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) for helpfulness and safety.
Good For
- Multilingual Chatbots & Assistants: Excels in dialogue-based applications requiring support for multiple languages.
- Agentic Applications: Ideal for tasks like knowledge retrieval and summarization.
- Resource-Constrained Environments: Quantized versions are suitable for on-device deployment with limited compute resources.
- Research & Commercial Use: Intended for both commercial and research applications, offering a robust foundation for natural language generation tasks.