Name: tsavage68/chat_1000STEPS_1e7rate_01beta_DPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tsavage68

Model Overview

The tsavage68/chat_1000STEPS_1e7rate_01beta_DPO is a 7 billion parameter language model derived from the meta-llama/Llama-2-7b-chat-hf base model. It has undergone a fine-tuning process using Direct Preference Optimization (DPO) with a specific training regimen.

Key Training Details

Base Model: meta-llama/Llama-2-7b-chat-hf
Optimization Method: Direct Preference Optimization (DPO)
Learning Rate: 1e-07
Training Steps: 1000
Batch Size: A total training batch size of 8 (train_batch_size: 4, gradient_accumulation_steps: 2)
Optimizer: Adam with standard betas and epsilon
Scheduler: Cosine learning rate scheduler with 100 warmup steps

Performance Metrics

During its 1000-step training, the model achieved a final validation loss of 0.6919. Key DPO-specific metrics include a rewards/accuracies score of 0.4637 and a rewards/margins of 0.0027, indicating its progress in aligning with preferred responses. The training focused on refining conversational outputs, as evidenced by the DPO methodology and the base model choice.

Intended Use

While specific intended uses and limitations are not detailed in the provided information, its fine-tuning from a chat-optimized Llama-2 variant suggests its suitability for conversational AI applications where response quality and preference alignment are important.

Overview

Model Overview

Key Training Details

Performance Metrics

Intended Use

Full Model Card (README)