Arthur-S/qwen2.5-math-1.5b-dpo-gsm8k

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 14, 2026Architecture:Transformer Warm

The Arthur-S/qwen2.5-math-1.5b-dpo-gsm8k model is a 1.5 billion parameter language model based on the Qwen2.5 architecture. This model is specifically fine-tuned using DPO (Direct Preference Optimization) for mathematical reasoning tasks, particularly excelling on the GSM8K benchmark. It is designed for applications requiring robust problem-solving capabilities in mathematics, leveraging its 32768 token context length for complex calculations.

Loading preview...

Model Overview

The Arthur-S/qwen2.5-math-1.5b-dpo-gsm8k is a 1.5 billion parameter model built upon the Qwen2.5 architecture. It has been specifically fine-tuned using Direct Preference Optimization (DPO) to enhance its performance in mathematical reasoning.

Key Capabilities

  • Mathematical Reasoning: Optimized for solving mathematical problems, particularly demonstrated by its focus on the GSM8K benchmark.
  • Large Context Window: Features a substantial 32768 token context length, enabling it to process and understand complex mathematical problems and multi-step reasoning.
  • Qwen2.5 Architecture: Benefits from the underlying Qwen2.5 base model's robust language understanding and generation capabilities.

Intended Use Cases

This model is particularly well-suited for:

  • Educational Tools: Assisting students with math homework or providing step-by-step solutions.
  • Automated Problem Solving: Developing applications that require automated solutions to arithmetic and algebraic problems.
  • Research in Mathematical AI: Serving as a base for further research and fine-tuning on specialized mathematical datasets.

Limitations

As indicated by the provided model card, specific details regarding training data, evaluation metrics, and potential biases are not yet available. Users should exercise caution and conduct their own evaluations for critical applications.