ramankrishna10/npc-reason
ramankrishna10/npc-reason is a 1.5 billion parameter math-reasoning model, specialized from DeepSeek-R1-Distill-Qwen-1.5B by Rama Krishna Bachu and Bottensor. It is uniquely designed to emit mechanically-checkable assertions for every arithmetic step, allowing for external verification of its reasoning chain. This model achieves 59.6% verified-and-correct accuracy on a frozen held-out evaluation set (GSM8K + MATH-500) and supports a 32768 token context length. Its primary strength lies in providing verifiable, grounded reasoning for arithmetic and arithmetic-reducible word problems.
Loading preview...
NPC Reason 1.5B: Verifiable Math Reasoning
ramankrishna10/npc-reason is a 1.5 billion parameter model, developed by Rama Krishna Bachu and Bottensor, specifically engineered for math reasoning. It distinguishes itself by generating mechanically-checkable assertions (<<EXPR = RESULT>>) for every arithmetic step, enabling external verification of its reasoning process. This ensures that the "verifiable-rate" is not subjective but can be confirmed by a pure-code checker.
Key Capabilities & Performance
- Mechanically Verifiable Reasoning: Unlike its base model, which produced zero verifiable chains, NPC Reason achieves a 76.2% verifiable rate on a held-out evaluation set (GSM8K + MATH-500).
- Improved Accuracy: The model demonstrates a 66.6% accuracy and a 59.6% verified-and-correct rate, indicating that verifiability was not gained at the expense of correctness.
- Stable RL Refinement: Training involved a stable RLVR/GRPO process using a hard, frozen, pure-code verifier as a reward, which is notable for its stability compared to prior unstable RL attempts in related work.
- Included Verifier: The
verifier/step_verifier.pytool is shipped with the model, allowing users to independently verify the model's outputs.
Intended Use Cases
- Math Problems with Grounded Reasoning: Ideal for arithmetic and arithmetic-reducible word problems where transparent, checkable reasoning steps are crucial.
- Research & Simulation: Serves as a simulation/research artifact for exploring verifiable reasoning in LLMs.
Limitations
- Math-First Focus: Not intended for general chat, logic, proofs, or general chain-of-thought tasks.
- Verifiability Ceiling: The model did not meet its pre-registered 90% verifiable bar, remaining near 77%.
- Research Artifact: Users should verify outputs with the included checker before relying on them.