Model Overview
This model, daydreamwarrior/Nemotron-Research-GooseReason-4B-Instruct-heretic-v2, is a 4 billion parameter instruction-tuned causal language model. It is a decensored variant of the original nvidia/Nemotron-Research-GooseReason-4B-Instruct, created using the Heretic v1.2.0 tool. The base model is Qwen3-4B-Instruct, which has been significantly enhanced through Reinforcement Learning with Verifiable Rewards (RLVR) using the novel Golden Goose pipeline and the GooseReason-0.7M dataset.
Key Capabilities & Differentiators
- Enhanced Reasoning: Achieves new state-of-the-art results among 4B-Instruct models across 15 diverse benchmarks, including mathematics, programming, STEM reasoning, instruction following, and logical puzzles.
- Golden Goose Pipeline: Utilizes a unique method to synthesize unlimited RLVR tasks from reasoning-rich, previously unverifiable internet text (e.g., science textbooks, Olympiad math forums, cybersecurity web scrapes). This addresses the bottleneck of scarce verifiable training data.
- GooseReason-0.7M Dataset: Trained on a large-scale RLVR dataset with over 0.7 million tasks, spanning Math (AoPS-Instruct), Code (rStar-Coder), and STEM (MegaScience).
- Superior Performance: Demonstrates significant absolute gains in mathematical (+2.18%) and coding (+2.24%) benchmarks compared to its base model, even outperforming the 30B parameter Qwen3-30B-Instruct in coding tasks.
- Decensored Version: This specific variant has undergone a decensoring process, as indicated by its
heretic-v2 suffix, resulting in a lower refusal rate (5/100) compared to the original model (99/100).
Intended Use
This model is primarily intended for research and development purposes, particularly for tasks requiring strong reasoning capabilities in technical domains. Its decensored nature may also be relevant for use cases where broader response generation is desired.