Skywork/Skywork-OR1-32B-Preview
Skywork/Skywork-OR1-32B-Preview is a 32.8 billion parameter Open Reasoner 1 (OR1) model developed by Skywork, specifically designed for advanced mathematical and coding reasoning tasks. This model leverages large-scale rule-based reinforcement learning and a multi-stage training pipeline to achieve performance comparable to 671B-parameter models on AIME24, AIME25, and LiveCodeBench benchmarks. It is optimized for complex problem-solving in math and code, making it suitable for applications requiring robust reasoning capabilities.
Loading preview...
Skywork-OR1-32B-Preview: Advanced Reasoning Model
Skywork-OR1-32B-Preview is part of the Skywork-OR1 (Open Reasoner 1) series, a collection of models specifically engineered for mathematical and coding reasoning. Developed by Skywork, this 32.8 billion parameter model utilizes a sophisticated training methodology involving large-scale rule-based reinforcement learning and carefully curated datasets.
Key Capabilities & Differentiators
- Exceptional Reasoning Performance: The model is designed to deliver high performance on complex reasoning tasks, particularly in mathematics and coding.
- Benchmark Parity: It achieves performance levels on par with the 671-billion parameter DeepSeek-R1 model across key benchmarks such as AIME24, AIME25, and LiveCodeBench, despite being significantly smaller.
- Advanced Training: Employs a customized version of GRPO with both offline and online difficulty-based filtering, rejection sampling, and a multi-stage training pipeline with adaptive entropy control to enhance exploration and stability.
- Curated Data: Trained on a meticulously selected and cleaned dataset comprising 110K verifiable math problems and 14K coding questions, with model-aware difficulty estimation.
Evaluation Metrics
Skywork-OR1-32B-Preview is evaluated using Avg@K (average performance across K independent attempts) rather than the traditional Pass@1, providing a more reliable measure of stability and reasoning consistency. On AIME24, it scores 79.7 (Avg@32); on AIME25, 69.0 (Avg@32); and on LiveCodeBench, 63.9 (Avg@4).
Ideal Use Cases
- Complex Mathematical Problem Solving: Excels in scenarios requiring advanced mathematical reasoning.
- Code Generation and Debugging: Highly effective for coding tasks, as demonstrated by its LiveCodeBench performance.
- Research and Development: Suitable for researchers exploring advanced reasoning capabilities in LLMs, particularly those interested in reinforcement learning-based training methodologies.