zbeeb/deepseek-r1-distill-qwen-14b-fast-math-r1-sft-10ep

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:May 28, 2026License:mitArchitecture:Transformer Open Weights Warm

The zbeeb/deepseek-r1-distill-qwen-14b-fast-math-r1-sft-10ep is a 14.8 billion parameter language model, fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-14B. It is specifically optimized for mathematical reasoning, designed to produce step-by-step solutions with final answers enclosed in \boxed{}. This model excels in math reasoning research and AIMO-style problem-solving experiments, supporting a context length of up to 32768 tokens.

Loading preview...

Model Overview

This model, zbeeb/deepseek-r1-distill-qwen-14b-fast-math-r1-sft-10ep, is a 14.8 billion parameter supervised fine-tune of the deepseek-ai/DeepSeek-R1-Distill-Qwen-14B base model. It was trained for 10 epochs on the RabotniKuma/Fast-Math-R1-SFT dataset, specifically targeting enhanced mathematical reasoning capabilities.

Key Capabilities

  • Step-by-Step Math Reasoning: Designed to generate detailed, step-by-step mathematical solutions.
  • Boxed Answers: Formats final answers within \boxed{} as required for specific math problem-solving contexts.
  • Long Context Support: Trained with a maximum sequence length of 24,000 tokens, supporting long-context applications.
  • AIMO-style Problem Solving: Optimized for tasks similar to those found in the American Invitational Mathematics Examination (AIME) or International Mathematical Olympiad (IMO).

Training Details

The model underwent full-parameter supervised fine-tuning using 6 NVIDIA H200 GPUs. It utilized a global batch size of 48, a learning rate of 1e-5 with a cosine scheduler, and bfloat16 precision. The training recipe was adapted from the Fast-Math-R1 style, focusing on robust mathematical instruction following.

Intended Use Cases

  • Math Reasoning Research: Ideal for experiments and studies focused on improving AI's mathematical problem-solving abilities.
  • AIMO-style Problem Solving: Suitable for developing and testing solutions for competitive mathematics problems.
  • Long-Context SFT Experiments: Can be used for further research into supervised fine-tuning with extended context lengths.

Limitations

Users should be aware that the model's outputs have not been independently benchmarked and may produce incorrect reasoning or malformed answers. Validation of outputs is crucial for applications where correctness is critical. While trained on 24k-token sequences, practical inference context length may vary based on deployment specifics.