XueZhang-bjtu/M-Thinker-7B-Iter2
XueZhang-bjtu/M-Thinker-7B-Iter2 is a 7.6 billion parameter large reasoning model (LRM) developed by Xue Zhang et al. that specializes in multilingual reasoning tasks. Built upon the DeepSeek-R1-Distill-Qwen-7B backbone, it is trained using a consistency-enhanced Reinforcement Learning (RL) algorithm, GRPO, which incorporates Language Consistency (LC) and Cross-lingual Thinking Alignment (CTA) rewards. This model achieves nearly 100% language consistency and superior performance on multilingual benchmarks like MMATH and PolyMath, making it ideal for non-English reasoning applications.
Loading preview...
M-Thinker-7B-Iter2: Multilingual Reasoning with Enhanced Consistency
M-Thinker-7B-Iter2 is a 7.6 billion parameter Large Reasoning Model (LRM) developed by Xue Zhang et al., specifically designed to overcome limitations in non-English reasoning tasks. Traditional LRMs often struggle with maintaining input-output language consistency and exhibit lower accuracy in non-English contexts. M-Thinker addresses these issues through a novel training approach.
Key Capabilities and Innovations
- Consistency-Enhanced Reinforcement Learning (RL): The model is trained using the GRPO algorithm, which integrates two specialized reward mechanisms:
- Language Consistency (LC) Reward: Ensures strict language consistency between the input, the model's thought process, and the final answer.
- Cross-lingual Thinking Alignment (CTA) Reward: Compares non-English reasoning paths with English reasoning paths to transfer and enhance reasoning capabilities across languages.
- Superior Multilingual Performance: Achieves nearly 100% language consistency and demonstrates superior performance on multilingual benchmarks such as MMATH and PolyMath.
- Generalization: Exhibits excellent generalization capabilities on out-of-domain languages, making it robust for diverse linguistic applications.
- Backbone: Built upon the DeepSeek-R1-Distill-Qwen-7B model.
Ideal Use Cases
- Multilingual Reasoning Applications: Particularly effective for complex reasoning tasks in non-English languages where maintaining language consistency and high accuracy are critical.
- Global AI Deployment: Suitable for scenarios requiring robust LRM performance for non-English speakers, enhancing user experience and accessibility.