ChuGyouk/Llama-3.1-8B: Multilingual Instruction-Tuned LLM
This model is an 8 billion parameter instruction-tuned variant from Meta's Llama 3.1 collection, designed for multilingual dialogue and general-purpose text generation. It utilizes an optimized transformer architecture with Grouped-Query Attention (GQA) for efficient inference. The model was fine-tuned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Key Capabilities
- Multilingual Support: Optimized for 8 languages (English, German, French, Italian, Portuguese, Hindi, Spanish, Thai) with a broader training data mix.
- Extended Context Window: Features a 128k token context length, enabling processing of longer inputs and generating more extensive outputs.
- Enhanced Performance: Demonstrates improvements over Llama 3 8B Instruct across various benchmarks, including MMLU (69.4%), HumanEval (72.6% pass@1), and MATH (51.9% final_em).
- Tool Use: Shows significant gains in tool-use benchmarks like API-Bank (82.6%) and BFCL (76.1%).
- Robust Safety: Developed with a focus on responsible deployment, incorporating safety fine-tuning and system safeguards like Llama Guard 3.
Good For
- Assistant-like Chatbots: Its instruction-tuned nature makes it suitable for conversational AI applications.
- Multilingual Applications: Ideal for tasks requiring understanding and generation in the supported languages.
- Code Generation and Math: Strong performance in HumanEval and MATH benchmarks suggests utility for programming and quantitative reasoning tasks.
- Research and Commercial Use: Intended for a wide range of commercial and research applications, with a focus on responsible deployment.