HasuerYu/KnowRL-Nemotron-1.5B
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 12, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

KnowRL-Nemotron-1.5B is a 1.5 billion parameter math reasoning model developed by HasuerYu, fine-tuned from nvidia/OpenMath-Nemotron-1.5B. It utilizes reinforcement learning (DAPO/GRPO) with minimal-sufficient knowledge point (KP) guidance to achieve state-of-the-art results among 1.5B-scale models on competition-level math benchmarks. This model excels at complex mathematical problem-solving, demonstrating improved reasoning capabilities even without explicit KP hints during inference, and supports a 32768 token context length.

Loading preview...

KnowRL-Nemotron-1.5B: Enhanced Math Reasoning with Minimal Knowledge Guidance

KnowRL-Nemotron-1.5B is a 1.5 billion parameter model developed by HasuerYu, specifically designed for competition-level math reasoning. Fine-tuned from nvidia/OpenMath-Nemotron-1.5B, this model leverages advanced reinforcement learning techniques (DAPO/GRPO) combined with a novel approach to knowledge guidance.

Key Capabilities & Innovations

  • State-of-the-Art Math Reasoning: Achieves an average accuracy of 74.16% (CSS strategy) across 8 competition-level math benchmarks, setting a new benchmark for 1.5B-scale models.
  • Minimal-Sufficient Knowledge Guidance: Instead of providing lengthy solution hints, KnowRL decomposes guidance into atomic "knowledge points" (KPs) and identifies the minimal subset required for effective learning. This allows for efficient training with approximately 38% fewer KPs compared to full-KP injection.
  • Genuine Policy Improvement: Demonstrates 70.08% average accuracy even without KP hints at inference, indicating a significant and genuine improvement in the model's underlying reasoning policy (+9.63% over baseline).
  • Reduced Reward Sparsity: Training effectively reduced reward sparsity from 41.21% zero-correct to 13.00%, leading to more stable and efficient learning.
  • High Context Length: Supports a maximum response length of 32,768 tokens, suitable for detailed step-by-step reasoning.

When to Use This Model

  • Competition-Level Math: Ideal for tasks requiring advanced mathematical reasoning, such as those found in AIME, BRUMO, HMMT, and Olympiad-style problems.
  • Efficient Reasoning: When you need a powerful math reasoning model that can perform well even without explicit external hints during inference.
  • Research in RL for Reasoning: Provides a strong baseline and methodology for exploring reinforcement learning with knowledge guidance in LLMs.

While optimized for math, its performance on other domains has not been evaluated. The model can be used with or without prepended KP hints for optimal performance, depending on the availability of selected knowledge points.