W-61/mistral-7b-base-margin-dpo-hh-harmless-4xh200-batch-64
W-61/mistral-7b-base-margin-dpo-hh-harmless-4xh200-batch-64 is a 7 billion parameter language model fine-tuned from a Mistral-7B base model. This model has undergone Direct Preference Optimization (DPO) using the Anthropic/hh-rlhf dataset, specifically targeting harmlessness. It is designed to generate responses that align with human preferences for safety and non-offensiveness, making it suitable for applications requiring robust content moderation and ethical AI interactions.
Loading preview...
Model Overview
This model, W-61/mistral-7b-base-margin-dpo-hh-harmless-4xh200-batch-64, is a 7 billion parameter language model built upon a Mistral-7B base architecture. It has been fine-tuned using a specific variant of Direct Preference Optimization (DPO) called "Margin DPO" on the Anthropic/hh-rlhf dataset. The primary objective of this fine-tuning was to enhance the model's harmlessness, aiming to produce outputs that are safe and non-offensive.
Key Characteristics
- Base Model: Mistral-7B.
- Fine-tuning Method: Margin Direct Preference Optimization (DPO).
- Training Data: Anthropic/hh-rlhf dataset, focused on human feedback for helpfulness and harmlessness.
- Objective: Optimized for generating harmless and ethically aligned responses.
Training Details
The model was trained with a learning rate of 5e-07 over 1 epoch, utilizing a total batch size of 64 across 4 GPUs. Evaluation metrics during training, such as Margin Dpo/loss Margin Mean and Logps/chosen, indicate the model's performance in aligning with the DPO objective. The final validation loss achieved was 0.5822.
Intended Use Cases
This model is particularly well-suited for applications where the generation of safe, non-toxic, and harmless content is paramount. This includes, but is not limited to, chatbots, content moderation systems, and any interactive AI where ethical considerations and user safety are critical design requirements.