Overview
Model Overview
NVIDIA's Llama-3.1-Nemotron-Nano-8B-v1 is an 8 billion parameter large language model, a derivative of Meta Llama-3.1-8B-Instruct. It has undergone a multi-phase post-training process, including supervised fine-tuning for Math, Code, Reasoning, and Tool Calling, as well as reinforcement learning stages (REINFORCE and Online Reward-aware Preference Optimization) for chat and instruction-following. This model aims to provide a strong balance between accuracy and computational efficiency, capable of running on a single RTX GPU locally.
Key Capabilities
- Enhanced Reasoning: Significantly improved performance in reasoning tasks, as demonstrated by benchmarks like MATH500 (95.4% pass@1 with reasoning on) and AIME25 (47.1% pass@1 with reasoning on).
- Instruction Following & Chat: Optimized for human chat preferences and general instruction following, with specific modes for "Reasoning On" and "Reasoning Off" controlled via system prompts.
- Tool Calling: Features improved capabilities for tool calling, as indicated by BFCL v2 Live scores.
- Code Generation: Strong performance in code generation, achieving 84.6% pass@1 on MBPP 0-shot with reasoning on.
- Multilingual Support: Primarily intended for English and coding languages, with support for other non-English languages including German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
- Extended Context: Supports a context length of up to 131,072 tokens.
Good For
- Developers building AI Agent systems, chatbots, and RAG systems.
- Applications requiring a strong balance of model accuracy and compute efficiency.
- Local deployment on single RTX GPUs.
- Tasks involving complex reasoning, mathematical problem-solving, and code generation.