HenryShan/Qwen2.5-Math-7B-DPO-10K
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:May 27, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold
HenryShan/Qwen2.5-Math-7B-DPO-10K is a 7.6 billion parameter language model, fine-tuned from Qwen2.5-Math-7B by HenryShan. Optimized for mathematical reasoning, this model specializes in generating detailed, step-by-step solutions across various mathematical domains like algebra, calculus, and geometry. It leverages Direct Preference Optimization (DPO) on the Math-Step-DPO-10K dataset to enhance its problem-solving capabilities.
Loading preview...
Overview
HenryShan/Qwen2.5-Math-7B-DPO-10K is a 7.6 billion parameter model, fine-tuned from the Qwen2.5-Math-7B base model. Its primary focus is on mathematical reasoning, specifically designed to provide step-by-step solutions to complex math problems.
Key Capabilities
- Specialized Mathematical Reasoning: Excels in generating detailed solutions for problems in algebra, calculus, and geometry.
- Direct Preference Optimization (DPO): Fine-tuned using DPO on the Math-Step-DPO-10K dataset to enhance the quality and clarity of its mathematical explanations.
- Parameter-Efficient Fine-tuning: Utilizes LoRA (Low-Rank Adaptation) with specific configurations (Rank: 8, Alpha: 10, Dropout: 0) for efficient adaptation.
- Apple Silicon Compatibility: The fine-tuning process was conducted using
mlx_lm.loraon Apple Silicon Mac hardware, indicating potential optimization for this ecosystem.
Good For
- Applications requiring detailed, step-by-step mathematical problem-solving.
- Educational tools or platforms that need to explain mathematical concepts and solutions.
- Research into advanced mathematical reasoning capabilities of large language models.