MetaStoneTec/MetaStone-S1-32B

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Jul 5, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

MetaStoneTec/MetaStone-S1-32B is a 32.8 billion parameter reflective generative model developed by MetaStoneTec, featuring a 131072 token context length. It is trained using a novel "Long-CoT Reinforcement Learning" and "Process Reward Learning" approach, enabling deep reasoning and high-quality reasoning trajectory selection. This architecture significantly reduces inference costs while maintaining performance comparable to larger models like OpenAI-o3 series on mathematics, coding, and Chinese reasoning tasks.

Loading preview...

Overview

MetaStone-S1-32B is a 32.8 billion parameter reflective generative model developed by MetaStoneTec. It introduces a novel reflective generative form that unifies "Long-CoT Reinforcement Learning" and "Process Reward Learning." This unique training methodology allows the model to achieve both deep reasoning capabilities and efficient selection of high-quality reasoning trajectories simultaneously. By sharing the backbone network between policy models and Process Reward Models (PRMs), MetaStone-S1-32B significantly reduces PRM inference costs by 99%, leading to faster and higher-quality responses.

Key Capabilities

  • Advanced Reasoning: Excels in complex mathematics, coding, and Chinese reasoning tasks.
  • Efficient Inference: Achieves substantial reduction in PRM inference cost due to shared backbone architecture.
  • Competitive Performance: Demonstrates performance comparable to larger models, including the OpenAI-o3 series, despite its 32.8B parameter size.
  • Long Context: Supports a context length of 131072 tokens.

Performance Highlights

MetaStone-S1-32B (specifically the 'high' variant) shows strong benchmark results:

  • AIME24: 85.2 (outperforming DeepSeek-R1-671B and OpenAI-o3-mini-medium)
  • AIME25: 73.6 (competitive with OpenAI-o3-mini-medium)
  • C-EVAL: 89.7 (competitive with DeepSeek-R1-671B)

Good for

  • Applications requiring strong mathematical and coding reasoning.
  • Tasks demanding high-quality, explainable reasoning trajectories.
  • Scenarios where efficient inference for complex reasoning is critical.
  • Use cases benefiting from a long context window for detailed problem-solving.