Name: tsavage68/chat_700STEPS_1e4rate_01beta_DPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tsavage68

Model Overview

The tsavage68/chat_700STEPS_1e4rate_01beta_DPO is a 7 billion parameter language model derived from the meta-llama/Llama-2-7b-chat-hf base. It has been fine-tuned over 700 steps using a learning rate of 0.0001, with a focus on conversational performance. The training process involved a DPO-like objective, as evidenced by the reported Rewards/chosen and Rewards/rejected metrics, which indicate an attempt to align model outputs with preferred responses.

Key Training Details

Base Model: Llama-2-7b-chat-hf
Training Steps: 700
Learning Rate: 0.0001
Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
Batch Size: train_batch_size of 4, gradient_accumulation_steps of 2, resulting in a total_train_batch_size of 8.
Evaluation Metrics: The model reports a final loss of 1.1848, with Rewards/chosen at -4.4236 and Rewards/rejected at -4.3538, and a Rewards/accuracies of 0.4000.

Intended Use Cases

This model is primarily intended for chat-based applications, leveraging the conversational strengths of its Llama 2 base. While specific details about the fine-tuning dataset are not provided, the DPO-like training suggests an optimization for generating preferred responses in interactive dialogue. Developers can utilize this model for building chatbots or conversational agents where a 7B parameter model is suitable for deployment.

Overview

Model Overview

Key Training Details

Intended Use Cases

Full Model Card (README)