Name: HasuerYu/KnowRL-Nemotron-1.5B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: HasuerYu

KnowRL-Nemotron-1.5B: Enhanced Math Reasoning with Minimal Knowledge Guidance

KnowRL-Nemotron-1.5B is a 1.5 billion parameter model developed by HasuerYu, specifically designed for competition-level math reasoning. Fine-tuned from nvidia/OpenMath-Nemotron-1.5B, this model leverages advanced reinforcement learning techniques (DAPO/GRPO) combined with a novel approach to knowledge guidance.

Key Capabilities & Innovations

State-of-the-Art Math Reasoning: Achieves an average accuracy of 74.16% (CSS strategy) across 8 competition-level math benchmarks, setting a new benchmark for 1.5B-scale models.
Minimal-Sufficient Knowledge Guidance: Instead of providing lengthy solution hints, KnowRL decomposes guidance into atomic "knowledge points" (KPs) and identifies the minimal subset required for effective learning. This allows for efficient training with approximately 38% fewer KPs compared to full-KP injection.
Genuine Policy Improvement: Demonstrates 70.08% average accuracy even without KP hints at inference, indicating a significant and genuine improvement in the model's underlying reasoning policy (+9.63% over baseline).
Reduced Reward Sparsity: Training effectively reduced reward sparsity from 41.21% zero-correct to 13.00%, leading to more stable and efficient learning.
High Context Length: Supports a maximum response length of 32,768 tokens, suitable for detailed step-by-step reasoning.

When to Use This Model

Competition-Level Math: Ideal for tasks requiring advanced mathematical reasoning, such as those found in AIME, BRUMO, HMMT, and Olympiad-style problems.
Efficient Reasoning: When you need a powerful math reasoning model that can perform well even without explicit external hints during inference.
Research in RL for Reasoning: Provides a strong baseline and methodology for exploring reinforcement learning with knowledge guidance in LLMs.

While optimized for math, its performance on other domains has not been evaluated. The model can be used with or without prepended KP hints for optimal performance, depending on the availability of selected knowledge points.