Nemotron-Orchestrator-8B: An Efficient Orchestration Model

Nemotron-Orchestrator-8B, developed by NVIDIA and the University of Hong Kong, is an 8-billion parameter model built upon the Qwen3-8B base. It specializes in orchestrating heterogeneous toolsets and expert models to solve complex, multi-turn agentic tasks. The model's training utilizes Group Relative Policy Optimization (GRPO) with a novel reward function that balances accuracy, latency, cost, and user preferences.

Key Capabilities

Intelligent Orchestration: Manages a wide array of tools, from basic functions like search and code execution to integrating other specialized and generalist LLMs.
Efficiency: Achieves high accuracy with significantly lower computational costs compared to larger, monolithic models.
Robust Generalization: Demonstrates the ability to adapt to new, unseen tools and varying pricing configurations.
Performance: On the Humanity's Last Exam (HLE) benchmark, it scores 37.1%, surpassing GPT-5 (35.1%) while being approximately 2.5 times faster and more cost-efficient. It also outperforms strong monolithic systems on FRAMES and τ²-Bench.

Ideal Use Cases

Complex Agentic Workflows: Suited for applications requiring the coordination of multiple AI models and tools to complete intricate, multi-step tasks.
Cost-Sensitive AI Deployments: Provides a high-performance alternative to larger models, offering better accuracy-to-cost ratios.
Research and Development: Primarily intended for exploring advanced agentic AI systems and tool orchestration strategies.

Overview

Nemotron-Orchestrator-8B: An Efficient Orchestration Model

Key Capabilities

Ideal Use Cases

Full Model Card (README)