Model Overview

This model, developed by Shengjia, University of Toronto, is a specialized continuation of the DeepSeek-R1-Distill-Qwen-1.5B base model. It has been continually fine-tuned for 175 steps, focusing on advanced mathematical reasoning tasks. With 1.78 billion parameters and an extended context window of 24,576 tokens, it is designed to handle complex problem-solving.

Key Capabilities & Training

Mathematical Reasoning: Specifically optimized for high school-level mathematics, demonstrated by its strong performance on the AIME 2024 benchmark.
Extended Context: Features a 24,576-token context length, allowing for processing and generating longer, more intricate mathematical solutions.
Advanced Training Method: Utilizes GAPO-GSPO (Geometric Adaptive Policy Optimization with Group-level Shapley Policy Optimization) without KL divergence penalty, trained on the DeepScaleR dataset comprising 39,207 math problems.

Performance Highlights

Achieves 44.0% pass@1 and 73.3% pass@32 on the challenging AIME 2024 competition (30 problems), indicating robust problem-solving abilities.

Recommended Usage

For optimal results on AIME-style problems, it is recommended to use a temperature of 0.6, top_p of 1.0, and generate 16-32 samples per problem with a max token output of 8k-24k, depending on complexity.

Overview

Model Overview

Key Capabilities & Training

Performance Highlights

Recommended Usage

Full Model Card (README)