Vinnnf/Thinkless-1.5B-RL-DeepScaleR is a 1.5 billion parameter language model developed by Gongfan Fang, Xinyin Ma, and Xinchao Wang. It is trained under a reinforcement learning paradigm using a Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, enabling it to adaptively select between short-form and long-form reasoning. This model is optimized to reduce computational costs by minimizing unnecessary long-chain thinking, particularly excelling in mathematical and reasoning benchmarks like Minerva Algebra, MATH-500, and GSM8K.
No reviews yet. Be the first to review!