YuchenLi01/genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-06_beta0.1_epoch16.0_42

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jul 6, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

This model is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct by YuchenLi01. It was trained using Direct Preference Optimization (DPO) on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset, suggesting an optimization for generating unique and preferred responses, potentially in mathematical or reasoning contexts. The model has a context length of 32768 tokens and was trained for 16 epochs.

Loading preview...

Overview

This model, genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-06_beta0.1_epoch16.0_42, is a 1.5 billion parameter language model developed by YuchenLi01. It is a fine-tuned variant of the Qwen/Qwen2.5-1.5B-Instruct base model, specifically optimized using Direct Preference Optimization (DPO).

Training Details

The model was fine-tuned on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset. Key training hyperparameters include:

  • Learning Rate: 1e-06
  • Batch Size: 32 (total train and eval)
  • Optimizer: Adam with default betas and epsilon
  • Epochs: 16.0

Performance Metrics

During evaluation, the model achieved a validation loss of 1.0560 and a rewards/accuracies score of 0.5500. The DPO training process aimed to improve response quality, indicated by rewards/chosen of -0.0768 and rewards/rejected of 0.0, with a margin of -0.0768.

Potential Use Cases

Given its fine-tuning on a dataset with "MATH" and "MoreUniqueResponse" in its name, this model is likely intended for tasks requiring:

  • Generation of diverse and preferred responses.
  • Applications involving mathematical reasoning or problem-solving, where unique and accurate outputs are valued.