Vinnnf/Thinkless-1.5B-RL-DeepScaleR

Warm
Public
1.5B
BF16
32768
1
May 16, 2025
License: apache-2.0
Hugging Face

Vinnnf/Thinkless-1.5B-RL-DeepScaleR is a 1.5 billion parameter language model developed by Gongfan Fang, Xinyin Ma, and Xinchao Wang. It is trained under a reinforcement learning paradigm using a Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, enabling it to adaptively select between short-form and long-form reasoning. This model is optimized to reduce computational costs by minimizing unnecessary long-chain thinking, particularly excelling in mathematical and reasoning benchmarks like Minerva Algebra, MATH-500, and GSM8K.

No reviews yet. Be the first to review!