unsloth/Llama-3.1-Nemotron-Nano-8B-v1

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 11, 2025License:nvidia-open-model-licenseArchitecture:Transformer0.0K Open Weights Cold

Llama-3.1-Nemotron-Nano-8B-v1 is an 8 billion parameter large language model developed by NVIDIA, derived from Meta Llama-3.1-8B-Instruct. This model is specifically post-trained for enhanced reasoning capabilities, human chat preferences, and tasks like RAG and tool calling, offering a balance of accuracy and efficiency. It supports a context length of 128K tokens and is designed for developers building AI agent systems, chatbots, and instruction-following applications.

Loading preview...

Model Overview

Llama-3.1-Nemotron-Nano-8B-v1 is an 8 billion parameter large language model developed by NVIDIA, built upon Meta Llama-3.1-8B-Instruct. It is specifically engineered for superior reasoning, human chat preferences, and tasks such as Retrieval Augmented Generation (RAG) and tool calling. The model underwent a multi-phase post-training process, including supervised fine-tuning for Math, Code, Reasoning, and Tool Calling, as well as reinforcement learning stages using REINFORCE (RLOO) and Online Reward-aware Preference Optimization (RPO) for chat and instruction-following.

Key Capabilities & Features

  • Enhanced Reasoning: Significantly improved performance in reasoning tasks, as evidenced by benchmarks like MATH500 (95.4% pass@1 in Reasoning On mode) and GPQA-D (54.1% pass@1).
  • Flexible Reasoning Modes: Supports both "Reasoning On" and "Reasoning Off" modes, controlled via the system prompt, allowing optimization for different task requirements.
  • Instruction Following: Strong performance in instruction-following tasks, with IFEval scores up to 82.1% (Strict:Instruction in Reasoning Off mode).
  • Code Generation: Achieves 84.6% pass@1 on MBPP 0-shot in Reasoning On mode, indicating robust code generation capabilities.
  • Efficiency: Designed to fit on a single RTX GPU, making it suitable for local deployment and offering a balance between accuracy and computational efficiency.
  • Extended Context: Supports a context length of up to 128K tokens, enabling processing of longer inputs and more complex interactions.

Ideal Use Cases

  • AI Agent Systems: Developers designing intelligent AI agents that require advanced reasoning.
  • Chatbots: Building sophisticated conversational AI applications with improved human chat preferences.
  • RAG Systems: Enhancing Retrieval Augmented Generation workflows with better reasoning and instruction following.
  • Tool Calling: Applications requiring the model to effectively use external tools.
  • General Instruction Following: Suitable for a wide range of instruction-based tasks in English and coding languages, with support for several other non-English languages.