Name: nvidia/Nemotron-Cascade-8B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nvidia

Nemotron-Cascade-8B: General-Purpose Reasoning with Cascade RL

Nemotron-Cascade-8B, developed by NVIDIA, is a powerful 8-billion parameter general-purpose model post-trained from the Qwen3-8B-Base architecture. Its key differentiator is a novel sequential and domain-wise reinforcement learning (Cascade RL) pipeline, which significantly enhances its complex reasoning abilities beyond standard preference optimization. This model uniquely supports both 'thinking' and 'instruct' modes, allowing users to explicitly guide its reasoning process.

Key Capabilities

Advanced Reasoning: Excels in general-knowledge reasoning, mathematical problem-solving, and competitive programming tasks.
Dual Operation Modes: Supports explicit 'thinking' and 'instruct' (non-reasoning) modes, controlled via chat template tags (/think or /no_think).
Best-in-Class Performance: Achieves performance comparable to much larger models (e.g., DeepSeek-R1-0528 671B) on benchmarks like LiveCodeBench (LCB) and LCB Pro, demonstrating strong capabilities in code and software engineering.
Long Context Support: Recommended usage includes RoPE scaling with the YaRN method to extend context length up to 64K tokens.
Robust Alignment: Shows strong performance in alignment and instruction following benchmarks like ArenaHard and IFBench.

Good for

Applications requiring complex reasoning and problem-solving, especially in mathematics and coding.
Scenarios where explicit control over the model's reasoning process (thinking vs. direct instruction) is beneficial.
Developers seeking a highly capable 8B parameter model that can compete with much larger models on specific reasoning tasks.
Use cases demanding long context understanding and generation, such as analyzing extensive codebases or detailed documentation.