Name: tsavage68/chat_1000STEPS_1e6rate_01beta_DPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tsavage68

Model Overview

The tsavage68/chat_1000STEPS_1e6rate_01beta_DPO is a 7 billion parameter language model derived from the meta-llama/Llama-2-7b-chat-hf base model. It has undergone a fine-tuning process using Direct Preference Optimization (DPO) over 1000 training steps.

Training Details

This model was trained with a learning rate of 1e-06, a batch size of 4 (accumulated to 8), and an Adam optimizer. The training procedure involved 1000 steps, resulting in specific evaluation metrics:

Loss: 0.6684
Rewards/chosen: -0.3437
Rewards/rejected: -0.4414
Rewards/accuracies: 0.5055
Rewards/margins: 0.0978

These metrics reflect the model's performance in distinguishing between preferred ('chosen') and dispreferred ('rejected') responses during DPO training.

Intended Uses & Limitations

As the model is a DPO fine-tune of Llama-2-7b-chat-hf, it is likely intended for conversational AI applications where preference alignment is crucial. However, specific intended uses, limitations, and details about the training dataset are not explicitly provided in the model card.

Overview

Model Overview

Training Details

Intended Uses & Limitations

Full Model Card (README)