Overview
Model Overview
RedHatAI/Llama-3.3-70B-Instruct is a 70 billion parameter instruction-tuned large language model developed by Meta, optimized for multilingual dialogue. This model utilizes an auto-regressive transformer architecture, fine-tuned with supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to enhance helpfulness and safety. It was trained on over 15 trillion tokens of publicly available online data, with a knowledge cutoff of December 2023, and features Grouped-Query Attention (GQA) for improved inference scalability.
Key Capabilities
- Multilingual Dialogue: Optimized for assistant-like chat in 8 supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
- High Performance: Outperforms many open-source and closed chat models on common industry benchmarks, including significant improvements in MMLU Pro, GPQA Diamond, HumanEval, and MATH scores compared to previous Llama 3.1 models.
- Tool Use Support: Integrates with various tool use formats, including advanced chat templating in Transformers for function calling.
- Synthetic Data Generation: Capable of generating synthetic data and distilling knowledge to improve other models.
- Extended Context: Features a substantial 128k token context length, enabling processing of longer inputs.
Good For
- Multilingual Chatbots and Assistants: Its optimization for multilingual dialogue makes it ideal for building conversational AI applications across different languages.
- Complex Reasoning Tasks: Strong performance on benchmarks like MATH and GPQA Diamond suggests suitability for tasks requiring advanced reasoning.
- Code Generation and Understanding: Achieves high scores on HumanEval and MBPP EvalPlus, indicating proficiency in coding tasks.
- Research and Commercial Applications: Intended for both commercial and research use, with a permissive Llama 3.3 Community License.
- Deployment Flexibility: Can be efficiently deployed on vLLM, Red Hat Enterprise Linux AI, and OpenShift AI, with support for 8-bit and 4-bit quantization for memory optimization.