Name: tsavage68/chat_1000STEPS_1e7_05beta_DPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tsavage68

Model Overview

tsavage68/chat_1000STEPS_1e7_05beta_DPO is a 7 billion parameter language model derived from the meta-llama/Llama-2-7b-chat-hf base model. It has undergone fine-tuning using Direct Preference Optimization (DPO) over 1000 training steps. The model's training process involved a learning rate of 1e-07, a batch size of 4, and an Adam optimizer.

Training Details

Base Model: meta-llama/Llama-2-7b-chat-hf
Fine-tuning Method: Direct Preference Optimization (DPO)
Parameters: 7 billion
Training Steps: 1000
Learning Rate: 1e-07
Optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
Final Validation Loss: 0.6864
Rewards/accuracies: 0.4571

Current Status

The model card indicates that more information is needed regarding its specific description, intended uses, limitations, and the dataset used for training. This suggests it may be an early-stage or experimental DPO fine-tune, with its unique capabilities and optimal applications yet to be fully defined or documented.

Overview

Model Overview

Training Details

Current Status

Full Model Card (README)