shengjia-toronto/r1distill-qwen1.5b-24k-gapo-gspo-step175-aime24-pass1_44-pass16_80

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 24, 2026Architecture:Transformer Cold

The shengjia-toronto/r1distill-qwen1.5b-24k-gapo-gspo-step175-aime24-pass1_44-pass16_80 model is a 1.78 billion parameter Qwen-based language model, continually fine-tuned by Shengjia, University of Toronto, for advanced mathematical reasoning. It features an extended 24,576-token context length and was trained using GAPO-GSPO on the DeepScaleR dataset. This model achieves strong performance on the AIME 2024 benchmark, with a 44.38% pass@1 and 80.00% pass@16, making it highly effective for solving complex high school mathematics problems.

Loading preview...

Model Overview

This model, developed by Shengjia, University of Toronto, is a continually fine-tuned version of the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model, specifically optimized for mathematical reasoning tasks. It boasts 1.78 billion parameters and an extended context length of 24,576 tokens, making it suitable for complex problems requiring extensive context.

Key Capabilities & Training

  • Mathematical Reasoning: Achieves strong performance on the challenging AIME 2024 high school mathematics competition, with a 44.38% pass@1 and 80.00% pass@16 (using 16 samples per prompt).
  • Extended Context: Features a 24,576-token context window, enabling it to process and reason over longer problem descriptions and solution steps.
  • Advanced Fine-tuning: Trained for 175 steps using the GAPO-GSPO (Geometric Adaptive Policy Optimization with Group-level Shapley Policy Optimization) method on the DeepScaleR dataset, which comprises 39,207 math problems.

Recommended Usage

This model is particularly well-suited for:

  • Solving advanced high school mathematics problems, especially those found in competitions like AIME.
  • Applications requiring robust mathematical reasoning within a substantial context window.

For optimal performance on AIME-style problems, it is recommended to use a temperature of 0.6, top-p of 1.0, and generate 16 samples per prompt, with a maximum response length between 8k and 24k tokens.