qihoo360/Light-R1-14B-DS
Light-R1-14B-DS by Qihoo360 is a 14 billion parameter language model, fine-tuned from DeepSeek-R1-Distill-Qwen-14B, specifically optimized for mathematical reasoning tasks. It is the first open-source model of its size to successfully apply Reinforcement Learning (RL) on an already long-Chain-of-Thought (COT) fine-tuned base model under a light computational budget. This model achieves state-of-the-art performance in the 14B math model category, scoring 74.0 on AIME24 and 60.2 on AIME25, making it suitable for advanced mathematical problem-solving.
Loading preview...
Light-R1-14B-DS: State-of-the-Art 14B Math Model
Light-R1-14B-DS, developed by Qihoo360, is a 14 billion parameter model derived from DeepSeek-R1-Distill-Qwen-14B. It represents a significant advancement as the first open-source model to successfully implement Reinforcement Learning (RL) on an already long-Chain-of-Thought (COT) fine-tuned model within a modest computational budget. This approach has led to notable improvements in mathematical reasoning capabilities.
Key Capabilities & Performance
- State-of-the-Art Math Performance: Achieves leading scores among 14B math models, with 74.0 on AIME24 and 60.2 on AIME25, surpassing many 32B models.
- RL Post-Training: Underwent a specialized long-COT RL Post-Training process, demonstrating expected behavior with simultaneous increases in response length and reward scores.
- Robustness: Performs well on the GPQA benchmark without specific training, indicating strong generalization.
- Data Decontamination: Features thorough data decontamination processes, including exact and N-gram matching, to ensure benchmark integrity.
Good For
- Advanced Mathematical Problem Solving: Ideal for applications requiring high-accuracy mathematical reasoning and complex problem-solving.
- Research in RL for LLMs: Provides a valuable open-source example of successful RL application on pre-fine-tuned models.
- Benchmarking and Development: A strong candidate for evaluating and developing new techniques in mathematical AI.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.