Skywork-OR1-7B: Open Reasoner for Math and Code
Skywork-OR1-7B is a 7.6 billion parameter model from the Skywork-OR1 (Open Reasoner 1) series, developed by Skywork. This model is specifically engineered for advanced math and code reasoning, leveraging large-scale rule-based reinforcement learning (RL) with meticulously designed datasets and training methodologies.
Key Capabilities & Features
- Specialized Reasoning: Optimized for complex mathematical problems and coding challenges.
- Competitive Performance: Demonstrates strong performance against other models of similar size in both math (AIME24, AIME25) and coding (LiveCodeBench) benchmarks.
- Robust Evaluation Metric: Utilizes Avg@K (average performance across K independent attempts) for evaluation, providing a more reliable measure of stability and reasoning consistency than Pass@1.
- Advanced Training: Employs a customized GRPO (Generalized Reinforcement Learning with Policy Optimization) approach, incorporating data-wise and training-wise improvements such as difficulty-based filtering, rejection sampling, and a multi-stage training pipeline with adaptive entropy control.
- Curated Data: Trained on a dataset of 110K verifiable math problems and 14K coding questions, with model-aware difficulty estimation and rigorous quality assessment.
Ideal Use Cases
- Mathematical Problem Solving: Excels in tasks requiring deep mathematical reasoning, as evidenced by its strong AIME scores.
- Code Generation and Debugging: Highly effective for coding scenarios, performing well on benchmarks like LiveCodeBench.
- Research in Reasoning Models: Provides a strong foundation for further research into open reasoning models, with its training data and code open-sourced.
Skywork-OR1-7B is built upon DeepSeek-R1-Distill-Qwen-7B and trained using a custom fork of the verl project, focusing on pushing the frontier of open reasoning capabilities.