PKU-DS-LAB/FairyR1-32B
FairyR1-32B is a 32 billion parameter reasoning model developed by PKU-DS-LAB, built upon the DeepSeek-R1-Distill-Qwen-32B base with a 32768 token context length. It leverages a novel "distill-and-merge" pipeline to achieve performance comparable to much larger models in mathematical and coding tasks. This model is optimized for efficiency, offering strong task-specific performance with significantly reduced parameters and inference costs.
Loading preview...
Overview
FairyR1-32B is a 32 billion parameter large language model developed by PKU-DS-LAB, designed for high efficiency and strong performance in specific domains. It is built on the DeepSeek-R1-Distill-Qwen-32B base and utilizes a novel "distill-and-merge" pipeline to achieve competitive results with a fraction of the parameters of larger models. This approach combines task-focused fine-tuning with model-merging techniques, significantly reducing size and inference cost.
Key Capabilities & Performance
- Mathematical Reasoning: Achieves 80.4 on AIME 2024 and 75.6 on AIME 2025, matching or exceeding DeepSeek-R1-671B.
- Code Generation: Scores 67.7 on LiveCodeBench, outperforming DeepSeek-R1-671B.
- Efficiency: Delivers comparable or superior performance in math and coding using only ~5% of the parameters of much larger models.
Training & Methodology
The model's development involved an overhauled distillation data pipeline, generating candidate answers from multiple 'teacher' models for datasets like AIMO/NuminaMath-1.5 (mathematics) and OpenThoughts-114k (code). These candidates were refined, especially for Chain-of-Thought (CoT) trajectories, and filtered to create focused training sets. Two domain experts (math and code) were trained independently and then fused into a single 32B-parameter model using the ArceeFusion tool. This streamlined process enables strong task-specific performance while maintaining a small model footprint.
Use Cases
FairyR1-32B is particularly well-suited for applications requiring strong mathematical reasoning and code generation capabilities where computational resources and inference costs are critical considerations.