PKU-DS-LAB/FairyR1-14B-Preview

TEXT GENERATIONConcurrency Cost:1Model Size:14BQuant:FP8Ctx Length:32kPublished:May 26, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

FairyR1-14B-Preview, developed by PKU-DS-LAB, is a 14-billion parameter large language model built upon the DeepSeek-R1-Distill-Qwen-14B base. Utilizing a 'distill-and-merge' pipeline, it achieves competitive performance in mathematical and coding tasks, often matching or exceeding larger models. This model is specifically optimized for efficiency and strong task-specific performance in math and programming domains.

Loading preview...

Overview

FairyR1-14B-Preview is a highly efficient 14-billion parameter large language model from PKU-DS-LAB, designed to deliver strong performance in specialized tasks while reducing model size and inference costs. It builds upon the DeepSeek-R1-Distill-Qwen-14B base, employing a refined 'Branch-Merge Distillation' approach previously explored in TinyR1.

Key Capabilities & Innovations

  • Optimized for Math and Code: Excels in mathematical reasoning and coding tasks, demonstrating superior performance on benchmarks like AIME 2024/2025 (Math) and LiveCodeBench (Code) compared to larger models such as DeepSeek-R1-Distill-Qwen-32B.
  • Efficient Distillation Pipeline: Features an overhauled data distillation pipeline that processes raw examples from datasets like AIMO/NuminaMath-1.5 and OpenThoughts-114k through 'teacher' models, followed by multi-stage filtering and refinement, especially for Chain-of-Thought (CoT) trajectories.
  • Model Fusion: Leverages the AcreeFusion tool to merge two independently trained domain experts (math and code) into a single 14B-parameter model, streamlining the merging process and reducing computational overhead.
  • Resource-Efficient Training: Achieves competitive results with significantly fewer parameters and lower computational costs, with training for math and coding specialists taking only 2.5 hours and 1.5 hours respectively on 16 NVIDIA-H100 GPUs, and model merging requiring no GPU.

When to Use This Model

FairyR1-14B-Preview is ideal for applications requiring high accuracy in mathematical problem-solving and code generation, particularly when computational resources or inference speed are critical considerations. Its specialized training makes it a strong candidate for focused tasks in these domains.