Vinnnf/Thinkless-1.5B-RL-DeepScaleR is a 1.5 billion parameter language model developed by Gongfan Fang, Xinyin Ma, and Xinchao Wang. It is trained under a reinforcement learning paradigm using a Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, enabling it to adaptively select between short-form and long-form reasoning. This model is optimized to reduce computational costs by minimizing unnecessary long-chain thinking, particularly excelling in mathematical and reasoning benchmarks like Minerva Algebra, MATH-500, and GSM8K.
Thinkless: Adaptive Reasoning LLM
Thinkless-1.5B-RL-DeepScaleR is a 1.5 billion parameter model designed to intelligently decide when to engage in detailed, long-form reasoning versus providing concise, short-form responses. Developed by Gongfan Fang, Xinyin Ma, and Xinchao Wang, this model utilizes a novel reinforcement learning framework with two control tokens: <short> for brief answers and <think> for in-depth reasoning.
Key Capabilities
- Adaptive Reasoning: Employs a learnable framework to select optimal reasoning modes based on task complexity and the model's internal assessment.
- Computational Efficiency: Significantly reduces the use of long-chain thinking by 50%-90% on various benchmarks, leading to lower computational costs compared to traditional Reasoning Language Models.
- Decoupled Learning: Incorporates a Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, separating control token selection from response accuracy, which stabilizes training and prevents collapse.
- Mathematical Proficiency: Demonstrates strong performance on mathematical and reasoning benchmarks such as Minerva Algebra, MATH-500, and GSM8K.
Good For
- Applications requiring efficient and adaptive reasoning, especially in mathematical problem-solving.
- Scenarios where balancing response conciseness with reasoning depth is crucial.
- Reducing inference costs for reasoning-intensive tasks by avoiding unnecessary complex thought processes.