XueZhang-bjtu/M-Thinker-1.5B-Iter1

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Oct 13, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

M-Thinker-1.5B-Iter1 by XueZhang-bjtu is a 1.5 billion parameter large reasoning model (LRM) with a 32768 token context length, specifically designed to enhance multilingual reasoning capabilities. It addresses limitations in language consistency and reasoning accuracy for non-English languages through a consistency-enhanced reinforcement learning approach. This model excels at maintaining input-output language consistency and improving reasoning performance across various languages, making it suitable for complex multilingual reasoning tasks.

Loading preview...

M-Thinker-1.5B-Iter1: Multilingual Reasoning with Enhanced Consistency

M-Thinker-1.5B-Iter1, developed by XueZhang-bjtu, is a 1.5 billion parameter Large Reasoning Model (LRM) that focuses on overcoming common limitations of LRMs in non-English languages. Traditional LRMs often struggle with maintaining language consistency across input, thought, and answer, and exhibit lower reasoning accuracy in non-English contexts compared to English.

Key Capabilities and Innovations

  • Enhanced Multilingual Reasoning: M-Thinker is trained using the GRPO algorithm, incorporating a novel Language Consistency (LC) reward and a Cross-lingual Thinking Alignment (CTA) reward.
  • Near-Perfect Language Consistency: The LC reward enforces strict language consistency between input, thought, and answer, aiming for nearly 100% consistency.
  • Cross-lingual Reasoning Transfer: The CTA reward facilitates the transfer of the model's English reasoning capabilities to non-English languages by comparing reasoning paths.
  • Superior Performance: Through an iterative reinforcement learning procedure, M-Thinker models demonstrate improved performance on multilingual benchmarks like MMATH and PolyMath.
  • Generalization to Out-of-Domain Languages: The model exhibits strong generalization capabilities to languages not explicitly seen during training.

Ideal Use Cases

  • Multilingual Reasoning Tasks: Particularly effective for complex reasoning problems requiring consistent language use across different stages of thought.
  • Applications Requiring High Language Consistency: Suitable for scenarios where maintaining the input language throughout the reasoning process is critical.
  • Global Deployment of LRMs: Designed to improve the user experience for non-English speakers by enhancing reasoning accuracy and consistency.