YuchenLi01/genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-06_beta0.1_epoch16.0_42
This model is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct by YuchenLi01. It was trained using Direct Preference Optimization (DPO) on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset, suggesting an optimization for generating unique and preferred responses, potentially in mathematical or reasoning contexts. The model has a context length of 32768 tokens and was trained for 16 epochs.
Loading preview...
Overview
This model, genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-06_beta0.1_epoch16.0_42, is a 1.5 billion parameter language model developed by YuchenLi01. It is a fine-tuned variant of the Qwen/Qwen2.5-1.5B-Instruct base model, specifically optimized using Direct Preference Optimization (DPO).
Training Details
The model was fine-tuned on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset. Key training hyperparameters include:
- Learning Rate: 1e-06
- Batch Size: 32 (total train and eval)
- Optimizer: Adam with default betas and epsilon
- Epochs: 16.0
Performance Metrics
During evaluation, the model achieved a validation loss of 1.0560 and a rewards/accuracies score of 0.5500. The DPO training process aimed to improve response quality, indicated by rewards/chosen of -0.0768 and rewards/rejected of 0.0, with a margin of -0.0768.
Potential Use Cases
Given its fine-tuning on a dataset with "MATH" and "MoreUniqueResponse" in its name, this model is likely intended for tasks requiring:
- Generation of diverse and preferred responses.
- Applications involving mathematical reasoning or problem-solving, where unique and accurate outputs are valued.