asparius/Qwen2.5-1.5B-SPO-1ep-iter2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 24, 2025Architecture:Transformer Warm

The asparius/Qwen2.5-1.5B-SPO-1ep-iter2 model is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B. It was specifically trained on the DigitalLearningGmbH/MATH-lighteval dataset using the GRPO method, which is designed to enhance mathematical reasoning. This model is optimized for tasks requiring strong mathematical problem-solving capabilities, leveraging its 131072 token context length for complex calculations.

Loading preview...

Model Overview

This model, asparius/Qwen2.5-1.5B-SPO-1ep-iter2, is a specialized fine-tuned version of the Qwen2.5-1.5B base model. It has been trained with a focus on improving mathematical reasoning abilities, utilizing the DigitalLearningGmbH/MATH-lighteval dataset.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-1.5B, a 1.5 billion parameter language model.
  • Training Method: Fine-tuned using GRPO (Gradient-based Reinforcement Learning with Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).
  • Dataset: Specifically trained on the DigitalLearningGmbH/MATH-lighteval dataset, indicating an optimization for mathematical tasks.
  • Context Length: Features a substantial context window of 131072 tokens, beneficial for handling complex problems with extensive input.

Use Cases

This model is particularly well-suited for applications requiring:

  • Mathematical Reasoning: Excels in tasks that involve solving mathematical problems and understanding complex numerical relationships.
  • Educational Tools: Can be integrated into systems for generating math explanations, solving equations, or assisting with mathematical homework.
  • Research in Mathematical AI: Provides a strong baseline for further research and development in enhancing AI's mathematical capabilities.