Name: nvidia/Nemotron-Cascade-8B-Thinking API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nvidia

Overview

NVIDIA's Nemotron-Cascade-8B-Thinking is an 8 billion parameter general-purpose language model, built upon the Qwen3-8B-Base architecture. It distinguishes itself through a unique training pipeline involving multi-stage Supervised Fine-Tuning (SFT) followed by Cascade Reinforcement Learning (RL) across multiple domains. This model is exclusively optimized for a "thinking" mode, enhancing its ability to perform complex reasoning tasks.

Key Capabilities

Advanced Reasoning: Achieves best-in-class performance across a diverse set of benchmarks including general-knowledge reasoning, mathematical reasoning (e.g., AIME 2024/2025), and competitive programming (LiveCodeBench).
Reinforcement Learning Enhancement: Utilizes RLHF as a pre-step to significantly boost complex reasoning, with subsequent domain-wise RLVR stages further refining performance without degradation.
Code Performance: Demonstrates strong capabilities in coding benchmarks like LiveCodeBench (LCB v5, v6) and SWE Verified, with scores comparable to much larger models like DeepSeek-R1-0528 (671B).
Alignment and Instruction Following: Shows robust performance in alignment benchmarks such as ArenaHard and IFBench.
Optimized for "Thinking" Mode: Designed specifically for tasks requiring deep analytical thought, indicated by its unique chat template requiring a " /think" tag for user input.

Usage Recommendations

Sampling Parameters: Recommended settings are temperature = 0.6 and top_p = 0.95 for local deployment.
Long Context Support: Supports extended context lengths using RoPE scaling with the YaRN method; specifically, factor: 2.0 is recommended for this model across all benchmarks.

Good For

Applications requiring strong general-purpose reasoning and problem-solving.
Tasks involving mathematical and logical deduction.
Code generation and software engineering challenges.
Scenarios where a model's "thought process" or intermediate reasoning steps are beneficial.

Overview

Overview

Key Capabilities

Usage Recommendations

Good For

Full Model Card (README)