rediska0123/qwen2.5-math-1.5b-dpo-gsm8k-v2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 4, 2026Architecture:Transformer Warm

The rediska0123/qwen2.5-math-1.5b-dpo-gsm8k-v2 model is a 1.5 billion parameter language model with a 32768 token context length. Developed by rediska0123, this model is fine-tuned for mathematical reasoning tasks, specifically leveraging DPO (Direct Preference Optimization) on the GSM8K dataset. Its primary strength lies in solving grade school math problems, making it suitable for applications requiring numerical and logical problem-solving capabilities.

Loading preview...

Model Overview

This model, rediska0123/qwen2.5-math-1.5b-dpo-gsm8k-v2, is a 1.5 billion parameter language model with a substantial context length of 32768 tokens. It has been fine-tuned using Direct Preference Optimization (DPO) specifically on the GSM8K dataset, indicating a strong focus on mathematical reasoning and problem-solving.

Key Capabilities

  • Mathematical Reasoning: Optimized for solving grade school mathematical word problems, as evidenced by its training on the GSM8K dataset.
  • Efficient Size: With 1.5 billion parameters, it offers a balance between performance and computational efficiency for specialized tasks.
  • Extended Context: A 32768 token context window allows for processing longer problem descriptions and complex mathematical sequences.

Good For

  • Educational Tools: Developing AI tutors or assistants for mathematics.
  • Automated Problem Solving: Applications requiring automated solutions to numerical and logical challenges.
  • Research in Mathematical LLMs: As a base or comparative model for further research into improving mathematical capabilities of language models.