XueZhang-bjtu/M-Thinker-1.5B-Iter2
M-Thinker-1.5B-Iter2 is a 1.5 billion parameter large reasoning model developed by Xue Zhang and colleagues, designed to enhance multilingual reasoning capabilities. It addresses limitations in non-English language processing by improving input-output language consistency and reasoning path quality. Trained with a novel GRPO algorithm incorporating Language Consistency (LC) and Cross-lingual Thinking Alignment (CTA) rewards, this model excels at complex reasoning tasks across various languages, achieving high language consistency and superior performance on multilingual benchmarks like MMATH and PolyMath.
Loading preview...
M-Thinker-1.5B-Iter2: Multilingual Reasoning Model
M-Thinker-1.5B-Iter2 is a 1.5 billion parameter model developed by Xue Zhang et al., specifically engineered to overcome common limitations of Large Reasoning Models (LRMs) in non-English languages. Traditional LRMs often struggle with maintaining language consistency across input, thought, and answer, and exhibit lower accuracy on non-English reasoning paths compared to English.
Key Innovations
This model introduces a novel training approach using the GRPO algorithm with two distinct reward mechanisms:
- Language Consistency (LC) Reward: Enforces strict adherence to language consistency between the input, the model's thought process, and the final answer.
- Cross-lingual Thinking Alignment (CTA) Reward: Transfers the model's English reasoning capabilities to non-English languages by comparing non-English reasoning paths with their English counterparts.
Performance and Capabilities
Through an iterative Reinforcement Learning (RL) procedure, M-Thinker-1.5B-Iter2 achieves nearly 100% language consistency and demonstrates superior performance on multilingual benchmarks such as MMATH and PolyMath. It also exhibits strong generalization to out-of-domain languages, making it a robust solution for global deployment of reasoning-focused AI applications.
Use Cases
- Multilingual Reasoning Tasks: Ideal for applications requiring complex problem-solving and logical deduction in various languages.
- Cross-lingual AI Systems: Enhances the reliability and accuracy of AI systems operating in diverse linguistic environments.
- Improved User Experience: Provides a more consistent and accurate reasoning experience for non-English speakers.