Overview
AceReason-Nemotron-14B: RL-Enhanced Math and Code Reasoning
AceReason-Nemotron-14B is a 14 billion parameter model from NVIDIA, distinguished by its training methodology: it is developed entirely through reinforcement learning (RL), building upon the DeepSeek-R1-Distilled-Qwen-14B base. This unique approach has significantly boosted its capabilities in complex math and code reasoning tasks.
Key Capabilities & Performance
- Reinforcement Learning Focus: The model's core differentiator is its RL-only training, which has been shown to elicit and push the limits of foundational reasoning abilities.
- Exceptional Math Reasoning: Achieves 78.6% on AIME 2024 (an +8.9% improvement) and 67.4% on AIME 2025 (a +17.4% improvement), demonstrating strong performance in advanced mathematical problem-solving.
- Robust Code Generation: Scores 61.1% on LiveCodeBench v5 (+8%) and 54.9% on LiveCodeBench v6 (+7%), alongside 2024 on Codeforces (+543), indicating proficiency in generating and understanding code.
- Strategic RL Training: NVIDIA's research shows that initial RL training on math-only prompts enhances both math and code reasoning, with subsequent code-only RL further improving code performance while maintaining math scores.
- High Context Length: Supports a context length of 32768 tokens, beneficial for handling intricate problems requiring extensive context.
Usage Recommendations
- No System Prompt: Instructions should be integrated directly into the user prompt.
- Math Instruction: For math questions, use "Please reason step by step, and put your final answer within \boxed{}."
- Code Instruction: For code questions, follow specific formatting for starter code or general code instructions to ensure optimal output.