Name: XueZhang-bjtu/M-Thinker-1.5B-Iter1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: XueZhang-bjtu

M-Thinker-1.5B-Iter1: Multilingual Reasoning with Enhanced Consistency

M-Thinker-1.5B-Iter1, developed by XueZhang-bjtu, is a 1.5 billion parameter Large Reasoning Model (LRM) that focuses on overcoming common limitations of LRMs in non-English languages. Traditional LRMs often struggle with maintaining language consistency across input, thought, and answer, and exhibit lower reasoning accuracy in non-English contexts compared to English.

Key Capabilities and Innovations

Enhanced Multilingual Reasoning: M-Thinker is trained using the GRPO algorithm, incorporating a novel Language Consistency (LC) reward and a Cross-lingual Thinking Alignment (CTA) reward.
Near-Perfect Language Consistency: The LC reward enforces strict language consistency between input, thought, and answer, aiming for nearly 100% consistency.
Cross-lingual Reasoning Transfer: The CTA reward facilitates the transfer of the model's English reasoning capabilities to non-English languages by comparing reasoning paths.
Superior Performance: Through an iterative reinforcement learning procedure, M-Thinker models demonstrate improved performance on multilingual benchmarks like MMATH and PolyMath.
Generalization to Out-of-Domain Languages: The model exhibits strong generalization capabilities to languages not explicitly seen during training.

Ideal Use Cases

Multilingual Reasoning Tasks: Particularly effective for complex reasoning problems requiring consistent language use across different stages of thought.
Applications Requiring High Language Consistency: Suitable for scenarios where maintaining the input language throughout the reasoning process is critical.
Global Deployment of LRMs: Designed to improve the user experience for non-English speakers by enhancing reasoning accuracy and consistency.

Overview

M-Thinker-1.5B-Iter1: Multilingual Reasoning with Enhanced Consistency

Key Capabilities and Innovations

Ideal Use Cases

Full Model Card (README)