ramankrishna10/npc-reason

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 16, 2026License:mitArchitecture:Transformer Open Weights Cold

ramankrishna10/npc-reason is a 1.5 billion parameter math-reasoning model, specialized from DeepSeek-R1-Distill-Qwen-1.5B by Rama Krishna Bachu and Bottensor. It is uniquely designed to emit mechanically-checkable assertions for every arithmetic step, allowing for external verification of its reasoning chain. This model achieves 59.6% verified-and-correct accuracy on a frozen held-out evaluation set (GSM8K + MATH-500) and supports a 32768 token context length. Its primary strength lies in providing verifiable, grounded reasoning for arithmetic and arithmetic-reducible word problems.

Loading preview...

NPC Reason 1.5B: Verifiable Math Reasoning

ramankrishna10/npc-reason is a 1.5 billion parameter model, developed by Rama Krishna Bachu and Bottensor, specifically engineered for math reasoning. It distinguishes itself by generating mechanically-checkable assertions (<<EXPR = RESULT>>) for every arithmetic step, enabling external verification of its reasoning process. This ensures that the "verifiable-rate" is not subjective but can be confirmed by a pure-code checker.

Key Capabilities & Performance

  • Mechanically Verifiable Reasoning: Unlike its base model, which produced zero verifiable chains, NPC Reason achieves a 76.2% verifiable rate on a held-out evaluation set (GSM8K + MATH-500).
  • Improved Accuracy: The model demonstrates a 66.6% accuracy and a 59.6% verified-and-correct rate, indicating that verifiability was not gained at the expense of correctness.
  • Stable RL Refinement: Training involved a stable RLVR/GRPO process using a hard, frozen, pure-code verifier as a reward, which is notable for its stability compared to prior unstable RL attempts in related work.
  • Included Verifier: The verifier/step_verifier.py tool is shipped with the model, allowing users to independently verify the model's outputs.

Intended Use Cases

  • Math Problems with Grounded Reasoning: Ideal for arithmetic and arithmetic-reducible word problems where transparent, checkable reasoning steps are crucial.
  • Research & Simulation: Serves as a simulation/research artifact for exploring verifiable reasoning in LLMs.

Limitations

  • Math-First Focus: Not intended for general chat, logic, proofs, or general chain-of-thought tasks.
  • Verifiability Ceiling: The model did not meet its pre-registered 90% verifiable bar, remaining near 77%.
  • Research Artifact: Users should verify outputs with the included checker before relying on them.