RedHatAI/Llama-3.3-70B-Instruct

Warm
Public
70B
FP8
32768
May 9, 2025
License: llama3.3
Hugging Face
Overview

Model Overview

RedHatAI/Llama-3.3-70B-Instruct is a 70 billion parameter instruction-tuned large language model developed by Meta, optimized for multilingual dialogue. This model utilizes an auto-regressive transformer architecture, fine-tuned with supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to enhance helpfulness and safety. It was trained on over 15 trillion tokens of publicly available online data, with a knowledge cutoff of December 2023, and features Grouped-Query Attention (GQA) for improved inference scalability.

Key Capabilities

  • Multilingual Dialogue: Optimized for assistant-like chat in 8 supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
  • High Performance: Outperforms many open-source and closed chat models on common industry benchmarks, including significant improvements in MMLU Pro, GPQA Diamond, HumanEval, and MATH scores compared to previous Llama 3.1 models.
  • Tool Use Support: Integrates with various tool use formats, including advanced chat templating in Transformers for function calling.
  • Synthetic Data Generation: Capable of generating synthetic data and distilling knowledge to improve other models.
  • Extended Context: Features a substantial 128k token context length, enabling processing of longer inputs.

Good For

  • Multilingual Chatbots and Assistants: Its optimization for multilingual dialogue makes it ideal for building conversational AI applications across different languages.
  • Complex Reasoning Tasks: Strong performance on benchmarks like MATH and GPQA Diamond suggests suitability for tasks requiring advanced reasoning.
  • Code Generation and Understanding: Achieves high scores on HumanEval and MBPP EvalPlus, indicating proficiency in coding tasks.
  • Research and Commercial Applications: Intended for both commercial and research use, with a permissive Llama 3.3 Community License.
  • Deployment Flexibility: Can be efficiently deployed on vLLM, Red Hat Enterprise Linux AI, and OpenShift AI, with support for 8-bit and 4-bit quantization for memory optimization.