Name: qihoo360/Light-R1-32B API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: qihoo360

Light-R1-32B: Advanced Math Reasoning Model

Light-R1-32B, developed by Qihoo360, is a 32.8 billion parameter model fine-tuned from Qwen2.5-32B-Instruct. It is specifically designed to excel in complex mathematical reasoning tasks, particularly those requiring Chain-of-Thought (COT) capabilities, even when starting from models not initially trained with long COT data.

Key Capabilities & Differentiators

Superior Math Performance: Achieves a score of 76.6 on AIME24 and 64.6 on AIME25, surpassing DeepSeek-R1-Distill-Qwen-32B and other models in its class.
Cost-Efficient Training: Developed using a curriculum SFT (Supervised Fine-Tuning) and DPO (Direct Preference Optimization) approach, with an estimated training cost of approximately $1000 (6 hours on 12 x H800 machines).
Transparent & Reproducible: All training datasets (SFT and DPO) and training code based on 360-LLaMA-Factory are open-sourced, providing a validated method for training strong long COT models.
Data Decontamination: Rigorous decontamination of training data against common reasoning benchmarks like AIME24/25 and MATH-500 to ensure robust and unbiased evaluation.
Forced Thinking Mechanism: Incorporates a <think> token in its chat template to explicitly prompt the model for reasoning steps, enhancing its problem-solving process.

Ideal Use Cases

Advanced Mathematical Problem Solving: Excels in competitive math challenges and complex quantitative analysis.
Research & Development: Provides a transparent and cost-effective baseline for developing and experimenting with long COT models.
Educational Tools: Can be integrated into systems requiring high-accuracy mathematical reasoning and step-by-step solutions.

Overview

Light-R1-32B: Advanced Math Reasoning Model

Key Capabilities & Differentiators

Ideal Use Cases

Full Model Card (README)