Overview
Model Overview
NVIDIA's Llama-3.1-Nemotron-70B-Instruct-HF is a 70 billion parameter instruction-tuned large language model built upon the Llama 3.1 architecture, featuring a 32768 token context window. This model is specifically customized by NVIDIA to significantly improve the helpfulness and quality of LLM-generated responses to user queries. It was trained using REINFORCE, a Reinforcement Learning from Human Feedback (RLHF) method, leveraging the Llama-3.1-Nemotron-70B-Reward model and the HelpSteer2-Preference dataset.
Key Capabilities & Performance
- Enhanced Helpfulness: Customized to provide more helpful, factually correct, coherent, and customizable responses.
- Leading Alignment Benchmarks: As of October 1, 2024, it ranks #1 on several automatic alignment benchmarks, including Arena Hard (85.0), AlpacaEval 2 LC (57.6), and GPT-4-Turbo MT-Bench (8.98), outperforming models like GPT-4o and Claude 3.5 Sonnet.
- Robust Instruction Following: Demonstrates strong general-domain instruction following, capable of accurately answering complex questions without specialized prompting.
Use Cases & Considerations
- General-Domain Instruction Following: Ideal for applications requiring highly helpful and accurate responses across a broad range of topics.
- Research and Development: Useful for exploring advanced RLHF techniques and model alignment strategies.
- Hardware Requirements: Requires significant computational resources, specifically 2 or more 80GB NVIDIA Ampere (or newer) GPUs and at least 150GB of disk space for deployment with HuggingFace Transformers.
This model is a demonstration of NVIDIA's techniques for improving helpfulness in general-domain instruction following, though it has not been specifically tuned for specialized domains like mathematics.