XueZhang-bjtu/M-Thinker-7B-Iter2
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Oct 14, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

XueZhang-bjtu/M-Thinker-7B-Iter2 is a 7.6 billion parameter large reasoning model (LRM) developed by Xue Zhang et al. that specializes in multilingual reasoning tasks. Built upon the DeepSeek-R1-Distill-Qwen-7B backbone, it is trained using a consistency-enhanced Reinforcement Learning (RL) algorithm, GRPO, which incorporates Language Consistency (LC) and Cross-lingual Thinking Alignment (CTA) rewards. This model achieves nearly 100% language consistency and superior performance on multilingual benchmarks like MMATH and PolyMath, making it ideal for non-English reasoning applications.

Loading preview...