nvidia/Llama-3.1-Nemotron-Nano-8B-v1

Warm
Public
8B
FP8
32768
Mar 16, 2025
License: other
Hugging Face
Overview

Model Overview

NVIDIA's Llama-3.1-Nemotron-Nano-8B-v1 is an 8 billion parameter large language model, a derivative of Meta Llama-3.1-8B-Instruct. It has undergone a multi-phase post-training process, including supervised fine-tuning for Math, Code, Reasoning, and Tool Calling, as well as reinforcement learning stages (REINFORCE and Online Reward-aware Preference Optimization) for chat and instruction-following. This model aims to provide a strong balance between accuracy and computational efficiency, capable of running on a single RTX GPU locally.

Key Capabilities

  • Enhanced Reasoning: Significantly improved performance in reasoning tasks, as demonstrated by benchmarks like MATH500 (95.4% pass@1 with reasoning on) and AIME25 (47.1% pass@1 with reasoning on).
  • Instruction Following & Chat: Optimized for human chat preferences and general instruction following, with specific modes for "Reasoning On" and "Reasoning Off" controlled via system prompts.
  • Tool Calling: Features improved capabilities for tool calling, as indicated by BFCL v2 Live scores.
  • Code Generation: Strong performance in code generation, achieving 84.6% pass@1 on MBPP 0-shot with reasoning on.
  • Multilingual Support: Primarily intended for English and coding languages, with support for other non-English languages including German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
  • Extended Context: Supports a context length of up to 131,072 tokens.

Good For

  • Developers building AI Agent systems, chatbots, and RAG systems.
  • Applications requiring a strong balance of model accuracy and compute efficiency.
  • Local deployment on single RTX GPUs.
  • Tasks involving complex reasoning, mathematical problem-solving, and code generation.