Overview
NVIDIA's Nemotron-Cascade-8B-Thinking is an 8 billion parameter general-purpose language model, built upon the Qwen3-8B-Base architecture. It distinguishes itself through a unique training pipeline involving multi-stage Supervised Fine-Tuning (SFT) followed by Cascade Reinforcement Learning (RL) across multiple domains. This model is exclusively optimized for a "thinking" mode, enhancing its ability to perform complex reasoning tasks.
Key Capabilities
- Advanced Reasoning: Achieves best-in-class performance across a diverse set of benchmarks including general-knowledge reasoning, mathematical reasoning (e.g., AIME 2024/2025), and competitive programming (LiveCodeBench).
- Reinforcement Learning Enhancement: Utilizes RLHF as a pre-step to significantly boost complex reasoning, with subsequent domain-wise RLVR stages further refining performance without degradation.
- Code Performance: Demonstrates strong capabilities in coding benchmarks like LiveCodeBench (LCB v5, v6) and SWE Verified, with scores comparable to much larger models like DeepSeek-R1-0528 (671B).
- Alignment and Instruction Following: Shows robust performance in alignment benchmarks such as ArenaHard and IFBench.
- Optimized for "Thinking" Mode: Designed specifically for tasks requiring deep analytical thought, indicated by its unique chat template requiring a " /think" tag for user input.
Usage Recommendations
- Sampling Parameters: Recommended settings are
temperature = 0.6 and top_p = 0.95 for local deployment. - Long Context Support: Supports extended context lengths using RoPE scaling with the YaRN method; specifically,
factor: 2.0 is recommended for this model across all benchmarks.
Good For
- Applications requiring strong general-purpose reasoning and problem-solving.
- Tasks involving mathematical and logical deduction.
- Code generation and software engineering challenges.
- Scenarios where a model's "thought process" or intermediate reasoning steps are beneficial.