shengjia-toronto/r1distill-qwen1.5b-24k-gapo-gspo-step175-aime24-pass1_44-pass32_73
The shengjia-toronto/r1distill-qwen1.5b-24k-gapo-gspo-step175-aime24-pass1_44-pass32_73 model is a 1.78 billion parameter Qwen-based language model, continually fine-tuned by Shengjia, University of Toronto, for mathematical reasoning. It features an extended 24,576-token context length and utilizes GAPO-GSPO training on the DeepScaleR dataset. This model excels in high school mathematics competitions, achieving 44% pass@1 and 73% pass@32 on AIME 2024 benchmarks.
Loading preview...
Model Overview
This model, developed by Shengjia, University of Toronto, is a specialized continuation of the DeepSeek-R1-Distill-Qwen-1.5B base model. It has been continually fine-tuned for 175 steps, focusing on advanced mathematical reasoning tasks. With 1.78 billion parameters and an extended context window of 24,576 tokens, it is designed to handle complex problem-solving.
Key Capabilities & Training
- Mathematical Reasoning: Specifically optimized for high school-level mathematics, demonstrated by its strong performance on the AIME 2024 benchmark.
- Extended Context: Features a 24,576-token context length, allowing for processing and generating longer, more intricate mathematical solutions.
- Advanced Training Method: Utilizes GAPO-GSPO (Geometric Adaptive Policy Optimization with Group-level Shapley Policy Optimization) without KL divergence penalty, trained on the DeepScaleR dataset comprising 39,207 math problems.
Performance Highlights
- Achieves 44.0% pass@1 and 73.3% pass@32 on the challenging AIME 2024 competition (30 problems), indicating robust problem-solving abilities.
Recommended Usage
For optimal results on AIME-style problems, it is recommended to use a temperature of 0.6, top_p of 1.0, and generate 16-32 samples per problem with a max token output of 8k-24k, depending on complexity.