HenryShan/Qwen2.5-Math-7B-DPO-10K

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:May 27, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

HenryShan/Qwen2.5-Math-7B-DPO-10K is a 7.6 billion parameter language model, fine-tuned from Qwen2.5-Math-7B by HenryShan. Optimized for mathematical reasoning, this model specializes in generating detailed, step-by-step solutions across various mathematical domains like algebra, calculus, and geometry. It leverages Direct Preference Optimization (DPO) on the Math-Step-DPO-10K dataset to enhance its problem-solving capabilities.

Loading preview...

Overview

HenryShan/Qwen2.5-Math-7B-DPO-10K is a 7.6 billion parameter model, fine-tuned from the Qwen2.5-Math-7B base model. Its primary focus is on mathematical reasoning, specifically designed to provide step-by-step solutions to complex math problems.

Key Capabilities

  • Specialized Mathematical Reasoning: Excels in generating detailed solutions for problems in algebra, calculus, and geometry.
  • Direct Preference Optimization (DPO): Fine-tuned using DPO on the Math-Step-DPO-10K dataset to enhance the quality and clarity of its mathematical explanations.
  • Parameter-Efficient Fine-tuning: Utilizes LoRA (Low-Rank Adaptation) with specific configurations (Rank: 8, Alpha: 10, Dropout: 0) for efficient adaptation.
  • Apple Silicon Compatibility: The fine-tuning process was conducted using mlx_lm.lora on Apple Silicon Mac hardware, indicating potential optimization for this ecosystem.

Good For

  • Applications requiring detailed, step-by-step mathematical problem-solving.
  • Educational tools or platforms that need to explain mathematical concepts and solutions.
  • Research into advanced mathematical reasoning capabilities of large language models.