Vinnnf/Thinkless-1.5B-Warmup is a 1.5 billion parameter language model developed by Vinnnf, serving as a warmup model for the Thinkless framework. This model is designed to learn adaptive reasoning, allowing an LLM to select between short-form and long-form reasoning based on task complexity. It utilizes control tokens and to manage response verbosity, aiming to reduce computational costs by minimizing unnecessary long-chain thinking. The model is trained to improve reasoning efficiency and accuracy on benchmarks like Minerva Algebra, MATH-500, and GSM8K.
Loading preview...
Thinkless: Adaptive Reasoning LLM
Vinnnf/Thinkless-1.5B-Warmup is a 1.5 billion parameter model that is part of the larger Thinkless framework, developed by Vinnnf. The core innovation of Thinkless is to enable LLMs to adaptively choose between concise () and detailed () reasoning based on the complexity of a given task and the model's capabilities. This approach aims to significantly reduce the computational cost associated with reasoning language models.
Key Capabilities
- Adaptive Reasoning Selection: Employs control tokens (
<short>and<think>) to dynamically switch between short-form and long-form reasoning. - Decoupled Group Relative Policy Optimization (DeGRPO): Utilizes a novel RL algorithm that separates the learning objectives for reasoning mode selection and answer accuracy, ensuring stable training and preventing collapse.
- Computational Efficiency: Demonstrated to reduce the usage of long-chain thinking by 50% - 90% on benchmarks such as Minerva Algebra, MATH-500, and GSM8K, leading to lower inference costs.
Good For
- Applications requiring efficient and adaptive reasoning.
- Scenarios where balancing response verbosity with computational cost is crucial.
- Research into reinforcement learning for LLM control and reasoning optimization.