YuchenLi01/genSoftQwen2.5MathRM72Bth0.5pair4NoGT_1.5B_dpo_ebs32_lr5e-07_beta1.5_epoch8.0_42

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 9, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

YuchenLi01/genSoftQwen2.5MathRM72Bth0.5pair4NoGT_1.5B_dpo_ebs32_lr5e-07_beta1.5_epoch8.0_42 is a 1.5 billion parameter Qwen2.5-Instruct model fine-tuned by YuchenLi01. This model is specifically optimized for mathematical reasoning tasks, demonstrating improved reward accuracies and margins on its evaluation set. It is designed to enhance performance in scenarios requiring precise mathematical problem-solving capabilities.

Loading preview...

Model Overview

This model, genSoftQwen2.5MathRM72Bth0.5pair4NoGT_1.5B_dpo_ebs32_lr5e-07_beta1.5_epoch8.0_42, is a 1.5 billion parameter variant of the Qwen2.5-Instruct architecture. It has been fine-tuned using a specialized dataset, YuchenLi01/MATH_Qwen2.5-1.5BInstruct_Soft_DPO_Qwen2.5MathRM72B_th0.5_pair4NoGT, with a focus on mathematical reasoning.

Key Capabilities

  • Enhanced Mathematical Reasoning: The model shows significant improvements in reward metrics, particularly in Rewards/accuracies (0.7622) and Rewards/margins (5.3692), indicating better performance in distinguishing correct from incorrect mathematical solutions.
  • Instruction Following: Built upon the Qwen2.5-Instruct base, it retains strong instruction-following capabilities.

Training Details

The model was trained for 8 epochs with a learning rate of 5e-07, using a total batch size of 32 across 8 GPUs. The training process utilized an Adam optimizer with a cosine learning rate scheduler and a warmup ratio of 0.1.

Good For

  • Applications requiring robust mathematical problem-solving.
  • Tasks where distinguishing between correct and incorrect mathematical outputs is critical.
  • Use cases benefiting from a smaller, yet specialized, language model for numerical and logical reasoning.