rstar2-reproduce/rStar2-Agent-14B
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:14BQuant:FP8Ctx Length:32kPublished:Aug 28, 2025License:mitArchitecture:Transformer0.0K Open Weights Warm

rstar2-reproduce/rStar2-Agent-14B is a 14 billion parameter math reasoning model that achieves performance comparable to 67B models through agentic reinforcement learning. Developed as part of the rStar2-Agent research, it excels at planning, reasoning, and autonomously using coding tools for complex problem-solving. This model is specifically optimized for mathematical tasks, efficiently exploring, verifying, and reflecting to solve problems. Its primary use case is advanced math reasoning and problem-solving leveraging agentic capabilities.

Loading preview...

rStar2-Agent-14B: Advanced Agentic Reasoning Model

rStar2-Agent-14B is a 14 billion parameter model developed as part of the rStar2-Agent research, detailed in its technical report. This model demonstrates advanced agentic reasoning capabilities, achieving math reasoning performance comparable to much larger 67B models through pure agentic reinforcement learning.

Key Capabilities

  • Agentic Reasoning: Excels at planning, reasoning, and autonomous problem-solving.
  • Tool Use: Capable of efficiently using coding tools (specifically Python code execution) to explore, verify, and reflect during complex tasks.
  • Math Problem Solving: Optimized for mathematical reasoning, enabling it to tackle intricate math problems by breaking them down and validating solutions.
  • Reproducible Performance: The model's math evaluation results are designed to be reproducible, with official code and training recipes available on the GitHub repository.

Usage and Integration

The model can be served using SGLang and integrated via an OpenAI-compatible API. It supports dynamic tool calling, allowing it to interact with external functions like execute_python_code_with_standard_io for real-time code execution and result processing. This enables a dynamic problem-solving loop where the model can generate code, execute it, and use the output to refine its reasoning.