Model Overview

This model, developed by Shengjia, University of Toronto, is a continually fine-tuned version of the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model, specifically optimized for mathematical reasoning tasks. It boasts 1.78 billion parameters and an extended context length of 24,576 tokens, making it suitable for complex problems requiring extensive context.

Key Capabilities & Training

Mathematical Reasoning: Achieves strong performance on the challenging AIME 2024 high school mathematics competition, with a 44.38% pass@1 and 80.00% pass@16 (using 16 samples per prompt).
Extended Context: Features a 24,576-token context window, enabling it to process and reason over longer problem descriptions and solution steps.
Advanced Fine-tuning: Trained for 175 steps using the GAPO-GSPO (Geometric Adaptive Policy Optimization with Group-level Shapley Policy Optimization) method on the DeepScaleR dataset, which comprises 39,207 math problems.

Recommended Usage

This model is particularly well-suited for:

Solving advanced high school mathematics problems, especially those found in competitions like AIME.
Applications requiring robust mathematical reasoning within a substantial context window.

For optimal performance on AIME-style problems, it is recommended to use a temperature of 0.6, top-p of 1.0, and generate 16 samples per prompt, with a maximum response length between 8k and 24k tokens.

Overview

Model Overview

Key Capabilities & Training

Recommended Usage

Full Model Card (README)