Name: nvidia/Llama-3.1-Nemotron-Nano-8B-v1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nvidia

Model Overview

NVIDIA's Llama-3.1-Nemotron-Nano-8B-v1 is an 8 billion parameter large language model, a derivative of Meta Llama-3.1-8B-Instruct. It has undergone a multi-phase post-training process, including supervised fine-tuning for Math, Code, Reasoning, and Tool Calling, as well as reinforcement learning stages (REINFORCE and Online Reward-aware Preference Optimization) for chat and instruction-following. This model aims to provide a strong balance between accuracy and computational efficiency, capable of running on a single RTX GPU locally.

Key Capabilities

Enhanced Reasoning: Significantly improved performance in reasoning tasks, as demonstrated by benchmarks like MATH500 (95.4% pass@1 with reasoning on) and AIME25 (47.1% pass@1 with reasoning on).
Instruction Following & Chat: Optimized for human chat preferences and general instruction following, with specific modes for "Reasoning On" and "Reasoning Off" controlled via system prompts.
Tool Calling: Features improved capabilities for tool calling, as indicated by BFCL v2 Live scores.
Code Generation: Strong performance in code generation, achieving 84.6% pass@1 on MBPP 0-shot with reasoning on.
Multilingual Support: Primarily intended for English and coding languages, with support for other non-English languages including German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Extended Context: Supports a context length of up to 131,072 tokens.

Good For

Developers building AI Agent systems, chatbots, and RAG systems.
Applications requiring a strong balance of model accuracy and compute efficiency.
Local deployment on single RTX GPUs.
Tasks involving complex reasoning, mathematical problem-solving, and code generation.