tsavage68/chat_700STEPS_1e4rate_01beta_DPO

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 13, 2024Architecture:Transformer Cold

tsavage68/chat_700STEPS_1e4rate_01beta_DPO is a 7 billion parameter language model fine-tuned from Meta's Llama-2-7b-chat-hf. This model underwent 700 training steps with a learning rate of 0.0001, focusing on improving conversational capabilities. While specific dataset details are unknown, its training process involved a DPO-like objective, indicated by rewards/chosen and rewards/rejected metrics. It is intended for chat-based applications, building upon the robust foundation of the Llama 2 architecture.

Loading preview...

Model Overview

The tsavage68/chat_700STEPS_1e4rate_01beta_DPO is a 7 billion parameter language model derived from the meta-llama/Llama-2-7b-chat-hf base. It has been fine-tuned over 700 steps using a learning rate of 0.0001, with a focus on conversational performance. The training process involved a DPO-like objective, as evidenced by the reported Rewards/chosen and Rewards/rejected metrics, which indicate an attempt to align model outputs with preferred responses.

Key Training Details

  • Base Model: Llama-2-7b-chat-hf
  • Training Steps: 700
  • Learning Rate: 0.0001
  • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
  • Batch Size: train_batch_size of 4, gradient_accumulation_steps of 2, resulting in a total_train_batch_size of 8.
  • Evaluation Metrics: The model reports a final loss of 1.1848, with Rewards/chosen at -4.4236 and Rewards/rejected at -4.3538, and a Rewards/accuracies of 0.4000.

Intended Use Cases

This model is primarily intended for chat-based applications, leveraging the conversational strengths of its Llama 2 base. While specific details about the fine-tuning dataset are not provided, the DPO-like training suggests an optimization for generating preferred responses in interactive dialogue. Developers can utilize this model for building chatbots or conversational agents where a 7B parameter model is suitable for deployment.