Name: YuchenLi01/genSoftQwen2.5MathRM72Bth0.5pair4NoGT_1.5B_dpo_ebs32_lr5e-07_beta1.5_epoch8.0_42 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: YuchenLi01

Model Overview

This model, genSoftQwen2.5MathRM72Bth0.5pair4NoGT_1.5B_dpo_ebs32_lr5e-07_beta1.5_epoch8.0_42, is a 1.5 billion parameter variant of the Qwen2.5-Instruct architecture. It has been fine-tuned using a specialized dataset, YuchenLi01/MATH_Qwen2.5-1.5BInstruct_Soft_DPO_Qwen2.5MathRM72B_th0.5_pair4NoGT, with a focus on mathematical reasoning.

Key Capabilities

Enhanced Mathematical Reasoning: The model shows significant improvements in reward metrics, particularly in Rewards/accuracies (0.7622) and Rewards/margins (5.3692), indicating better performance in distinguishing correct from incorrect mathematical solutions.
Instruction Following: Built upon the Qwen2.5-Instruct base, it retains strong instruction-following capabilities.

Training Details

The model was trained for 8 epochs with a learning rate of 5e-07, using a total batch size of 32 across 8 GPUs. The training process utilized an Adam optimizer with a cosine learning rate scheduler and a warmup ratio of 0.1.

Good For

Applications requiring robust mathematical problem-solving.
Tasks where distinguishing between correct and incorrect mathematical outputs is critical.
Use cases benefiting from a smaller, yet specialized, language model for numerical and logical reasoning.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)