shengjia-toronto/r1distill-qwen1.5b-24k-gapo-gspo-step175-aime24-pass1_44-pass32_73

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 23, 2026Architecture:Transformer Cold

The shengjia-toronto/r1distill-qwen1.5b-24k-gapo-gspo-step175-aime24-pass1_44-pass32_73 model is a 1.78 billion parameter Qwen-based language model, continually fine-tuned by Shengjia, University of Toronto, for mathematical reasoning. It features an extended 24,576-token context length and utilizes GAPO-GSPO training on the DeepScaleR dataset. This model excels in high school mathematics competitions, achieving 44% pass@1 and 73% pass@32 on AIME 2024 benchmarks.

Loading preview...

Model Overview

This model, developed by Shengjia, University of Toronto, is a specialized continuation of the DeepSeek-R1-Distill-Qwen-1.5B base model. It has been continually fine-tuned for 175 steps, focusing on advanced mathematical reasoning tasks. With 1.78 billion parameters and an extended context window of 24,576 tokens, it is designed to handle complex problem-solving.

Key Capabilities & Training

  • Mathematical Reasoning: Specifically optimized for high school-level mathematics, demonstrated by its strong performance on the AIME 2024 benchmark.
  • Extended Context: Features a 24,576-token context length, allowing for processing and generating longer, more intricate mathematical solutions.
  • Advanced Training Method: Utilizes GAPO-GSPO (Geometric Adaptive Policy Optimization with Group-level Shapley Policy Optimization) without KL divergence penalty, trained on the DeepScaleR dataset comprising 39,207 math problems.

Performance Highlights

  • Achieves 44.0% pass@1 and 73.3% pass@32 on the challenging AIME 2024 competition (30 problems), indicating robust problem-solving abilities.

Recommended Usage

For optimal results on AIME-style problems, it is recommended to use a temperature of 0.6, top_p of 1.0, and generate 16-32 samples per problem with a max token output of 8k-24k, depending on complexity.