GooseReason-4B-Instruct Overview
GooseReason-4B-Instruct is a 4 billion parameter instruction-tuned model developed by NVIDIA, specifically optimized for complex reasoning tasks. It is built upon Qwen3-4B-Instruct and enhanced through Reinforcement Learning with Verifiable Rewards (RLVR) using the innovative Golden Goose pipeline and the GooseReason-0.7M dataset.
Key Capabilities
- Advanced Reasoning: Achieves state-of-the-art results among 4B-Instruct models across 15 diverse benchmarks.
- Mathematics Proficiency: Demonstrates significant gains in math benchmarks like AIME, AMC, MATH, Minerva, and Olympiad Bench, with a +2.18% absolute gain in average math performance.
- Strong Coding Performance: Shows a +2.24% absolute gain in coding average across APPS, CodeContests, CodeForces, TACO, HumanEvalPlus, and LiveCodeBench, outperforming Qwen3-30B-Instruct.
- STEM and Logic: Improved performance on STEM reasoning (GPQA Diamond), instruction following (IFEval), and various logical puzzles in Reasoning Gym.
- Scalable RLVR Training: Leverages the Golden Goose pipeline to synthesize over 0.7 million verifiable tasks from previously unusable internet text (e.g., Olympiad math forums, science textbooks, competitive programming problems without test cases), overcoming data scarcity for RLVR.
Good For
- Research and Development: Specifically designed for research and development in advanced reasoning and problem-solving.
- Mathematical Problem Solving: Ideal for applications requiring high accuracy in complex mathematical challenges.
- Code Generation and Analysis: Suitable for tasks involving competitive programming problems and general code reasoning.
- Scientific and Logical Reasoning: Effective for STEM-related question answering and solving logical puzzles.
- Exploring RLVR Techniques: Provides a strong baseline for further research into Reinforcement Learning with Verifiable Rewards using synthesized datasets.