Model Overview

Llama-3.1-Nemotron-Nano-8B-v1 is an 8 billion parameter large language model developed by NVIDIA, based on Meta Llama-3.1-8B-Instruct. It is specifically post-trained to enhance reasoning capabilities, human chat preferences, and tasks like RAG and tool calling, aiming for an optimal balance between accuracy and computational efficiency. The model supports a substantial context length of 128K tokens and can run on a single RTX GPU, making it suitable for local deployment.

Key Capabilities & Features

Enhanced Reasoning: Underwent multi-phase post-training, including supervised fine-tuning for Math, Code, Reasoning, and Tool Calling, and multiple reinforcement learning stages.
Flexible Reasoning Modes: Supports distinct "Reasoning On" and "Reasoning Off" modes, controlled via the system prompt, with specific recommendations for temperature and top_p settings.
Performance Improvements: Demonstrates significant improvements in reasoning benchmarks like MATH500 (95.4% pass@1 in Reasoning On) and AIME25 (47.1% pass@1 in Reasoning On) compared to its "Reasoning Off" mode.
Multilingual Support: Primarily intended for English and coding languages, with additional support for German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Commercial Use: Ready for commercial applications, governed by the NVIDIA Open Model License and Llama 3.1 Community License.

Ideal Use Cases

AI Agent Systems: Designed to power intelligent agents requiring robust reasoning.
Chatbots: Optimized for human chat preferences and instruction-following.
RAG Systems: Suitable for retrieval-augmented generation applications.
Instruction Following: Excels in general instruction-following tasks, balancing accuracy and compute efficiency.

Overview

Model Overview

Key Capabilities & Features

Ideal Use Cases

Full Model Card (README)