Model Overview

Llama-3.1-Nemotron-Nano-8B-v1 is an 8 billion parameter large language model developed by NVIDIA, built upon Meta Llama-3.1-8B-Instruct. It is specifically engineered for superior reasoning, human chat preferences, and tasks such as Retrieval Augmented Generation (RAG) and tool calling. The model underwent a multi-phase post-training process, including supervised fine-tuning for Math, Code, Reasoning, and Tool Calling, as well as reinforcement learning stages using REINFORCE (RLOO) and Online Reward-aware Preference Optimization (RPO) for chat and instruction-following.

Key Capabilities & Features

Enhanced Reasoning: Significantly improved performance in reasoning tasks, as evidenced by benchmarks like MATH500 (95.4% pass@1 in Reasoning On mode) and GPQA-D (54.1% pass@1).
Flexible Reasoning Modes: Supports both "Reasoning On" and "Reasoning Off" modes, controlled via the system prompt, allowing optimization for different task requirements.
Instruction Following: Strong performance in instruction-following tasks, with IFEval scores up to 82.1% (Strict:Instruction in Reasoning Off mode).
Code Generation: Achieves 84.6% pass@1 on MBPP 0-shot in Reasoning On mode, indicating robust code generation capabilities.
Efficiency: Designed to fit on a single RTX GPU, making it suitable for local deployment and offering a balance between accuracy and computational efficiency.
Extended Context: Supports a context length of up to 128K tokens, enabling processing of longer inputs and more complex interactions.

Ideal Use Cases

AI Agent Systems: Developers designing intelligent AI agents that require advanced reasoning.
Chatbots: Building sophisticated conversational AI applications with improved human chat preferences.
RAG Systems: Enhancing Retrieval Augmented Generation workflows with better reasoning and instruction following.
Tool Calling: Applications requiring the model to effectively use external tools.
General Instruction Following: Suitable for a wide range of instruction-based tasks in English and coding languages, with support for several other non-English languages.

Overview

Model Overview

Key Capabilities & Features

Ideal Use Cases

Full Model Card (README)