Vinnnf/Thinkless-1.5B-RL-DeepScaleR

Warm
Public
1.5B
BF16
32768
1
May 16, 2025
License: apache-2.0
Hugging Face

Vinnnf/Thinkless-1.5B-RL-DeepScaleR is a 1.5 billion parameter language model developed by Gongfan Fang, Xinyin Ma, and Xinchao Wang. It is trained under a reinforcement learning paradigm using a Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, enabling it to adaptively select between short-form and long-form reasoning. This model is optimized to reduce computational costs by minimizing unnecessary long-chain thinking, particularly excelling in mathematical and reasoning benchmarks like Minerva Algebra, MATH-500, and GSM8K.

Overview

Thinkless: Adaptive Reasoning LLM

Thinkless-1.5B-RL-DeepScaleR is a 1.5 billion parameter model designed to intelligently decide when to engage in detailed, long-form reasoning versus providing concise, short-form responses. Developed by Gongfan Fang, Xinyin Ma, and Xinchao Wang, this model utilizes a novel reinforcement learning framework with two control tokens: <short> for brief answers and <think> for in-depth reasoning.

Key Capabilities

  • Adaptive Reasoning: Employs a learnable framework to select optimal reasoning modes based on task complexity and the model's internal assessment.
  • Computational Efficiency: Significantly reduces the use of long-chain thinking by 50%-90% on various benchmarks, leading to lower computational costs compared to traditional Reasoning Language Models.
  • Decoupled Learning: Incorporates a Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, separating control token selection from response accuracy, which stabilizes training and prevents collapse.
  • Mathematical Proficiency: Demonstrates strong performance on mathematical and reasoning benchmarks such as Minerva Algebra, MATH-500, and GSM8K.

Good For

  • Applications requiring efficient and adaptive reasoning, especially in mathematical problem-solving.
  • Scenarios where balancing response conciseness with reasoning depth is crucial.
  • Reducing inference costs for reasoning-intensive tasks by avoiding unnecessary complex thought processes.