qihoo360/Light-R1-32B

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Mar 4, 2025License:apache-2.0Architecture:Transformer0.1K Open Weights Cold

Light-R1-32B is a 32.8 billion parameter language model developed by Qihoo360, fine-tuned from Qwen2.5-32B-Instruct. It is specifically optimized for complex mathematical reasoning, achieving state-of-the-art performance on AIME24 and AIME25 benchmarks among models trained without long Chain-of-Thought (COT) data. The model utilizes a curriculum SFT & DPO approach and model merging to surpass previous R1-Distill models, making it highly effective for advanced math problem-solving.

Loading preview...

Light-R1-32B: Advanced Math Reasoning Model

Light-R1-32B, developed by Qihoo360, is a 32.8 billion parameter model fine-tuned from Qwen2.5-32B-Instruct. It is specifically designed to excel in complex mathematical reasoning tasks, particularly those requiring Chain-of-Thought (COT) capabilities, even when starting from models not initially trained with long COT data.

Key Capabilities & Differentiators

  • Superior Math Performance: Achieves a score of 76.6 on AIME24 and 64.6 on AIME25, surpassing DeepSeek-R1-Distill-Qwen-32B and other models in its class.
  • Cost-Efficient Training: Developed using a curriculum SFT (Supervised Fine-Tuning) and DPO (Direct Preference Optimization) approach, with an estimated training cost of approximately $1000 (6 hours on 12 x H800 machines).
  • Transparent & Reproducible: All training datasets (SFT and DPO) and training code based on 360-LLaMA-Factory are open-sourced, providing a validated method for training strong long COT models.
  • Data Decontamination: Rigorous decontamination of training data against common reasoning benchmarks like AIME24/25 and MATH-500 to ensure robust and unbiased evaluation.
  • Forced Thinking Mechanism: Incorporates a <think> token in its chat template to explicitly prompt the model for reasoning steps, enhancing its problem-solving process.

Ideal Use Cases

  • Advanced Mathematical Problem Solving: Excels in competitive math challenges and complex quantitative analysis.
  • Research & Development: Provides a transparent and cost-effective baseline for developing and experimenting with long COT models.
  • Educational Tools: Can be integrated into systems requiring high-accuracy mathematical reasoning and step-by-step solutions.