Overview
FairyR1-14B-Preview is a highly efficient 14-billion parameter large language model from PKU-DS-LAB, designed to deliver strong performance in specialized tasks while reducing model size and inference costs. It builds upon the DeepSeek-R1-Distill-Qwen-14B base, employing a refined 'Branch-Merge Distillation' approach previously explored in TinyR1.
Key Capabilities & Innovations
- Optimized for Math and Code: Excels in mathematical reasoning and coding tasks, demonstrating superior performance on benchmarks like AIME 2024/2025 (Math) and LiveCodeBench (Code) compared to larger models such as DeepSeek-R1-Distill-Qwen-32B.
- Efficient Distillation Pipeline: Features an overhauled data distillation pipeline that processes raw examples from datasets like AIMO/NuminaMath-1.5 and OpenThoughts-114k through 'teacher' models, followed by multi-stage filtering and refinement, especially for Chain-of-Thought (CoT) trajectories.
- Model Fusion: Leverages the AcreeFusion tool to merge two independently trained domain experts (math and code) into a single 14B-parameter model, streamlining the merging process and reducing computational overhead.
- Resource-Efficient Training: Achieves competitive results with significantly fewer parameters and lower computational costs, with training for math and coding specialists taking only 2.5 hours and 1.5 hours respectively on 16 NVIDIA-H100 GPUs, and model merging requiring no GPU.
When to Use This Model
FairyR1-14B-Preview is ideal for applications requiring high accuracy in mathematical problem-solving and code generation, particularly when computational resources or inference speed are critical considerations. Its specialized training makes it a strong candidate for focused tasks in these domains.