Name: YuchenLi01/genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-06_beta0.1_epoch16.0_42 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: YuchenLi01

Overview

This model, genv3pair1NoGT_1.5B_cdpo_ebs32_lr1e-06_beta0.1_epoch16.0_42, is a 1.5 billion parameter language model developed by YuchenLi01. It is a fine-tuned variant of the Qwen/Qwen2.5-1.5B-Instruct base model, specifically optimized using Direct Preference Optimization (DPO).

Training Details

The model was fine-tuned on the YuchenLi01/MATH_Qwen2.5-1.5BInstruct_DPO_MoreUniqueResponseNoGTv3pair1 dataset. Key training hyperparameters include:

Learning Rate: 1e-06
Batch Size: 32 (total train and eval)
Optimizer: Adam with default betas and epsilon
Epochs: 16.0

Performance Metrics

During evaluation, the model achieved a validation loss of 1.0560 and a rewards/accuracies score of 0.5500. The DPO training process aimed to improve response quality, indicated by rewards/chosen of -0.0768 and rewards/rejected of 0.0, with a margin of -0.0768.

Potential Use Cases

Given its fine-tuning on a dataset with "MATH" and "MoreUniqueResponse" in its name, this model is likely intended for tasks requiring:

Generation of diverse and preferred responses.
Applications involving mathematical reasoning or problem-solving, where unique and accurate outputs are valued.

Overview

Overview

Training Details

Performance Metrics

Potential Use Cases

Full Model Card (README)