W-61/llama-3-8b-base-new-dpo-hh-harmless-s_star0.4-4xh200-batch-64-20260421-204233
The W-61/llama-3-8b-base-new-dpo-hh-harmless-s_star0.4-4xh200-batch-64-20260421-204233 is an 8 billion parameter Llama 3 base model fine-tuned by W-61. This model has undergone further DPO (Direct Preference Optimization) training on the Anthropic/hh-rlhf dataset, specifically enhancing its harmlessness and alignment. It is optimized for generating safe and helpful responses, making it suitable for applications requiring robust content moderation and ethical AI interactions.
Loading preview...
Model Overview
This model, developed by W-61, is an 8 billion parameter Llama 3 base model that has been specifically fine-tuned for harmlessness and alignment. It builds upon the W-61/llama-3-8b-base-sft-hh-harmless-4xh200 model through an additional Direct Preference Optimization (DPO) phase using the Anthropic/hh-rlhf dataset.
Key Characteristics
- Base Model: Llama 3 8B parameters.
- Fine-tuning: Utilizes Direct Preference Optimization (DPO) for enhanced alignment.
- Dataset: Trained on the Anthropic/hh-rlhf dataset, focusing on human feedback for harmlessness.
- Performance: Achieved a validation loss of 0.5867 during DPO training, with specific metrics indicating preference alignment, such as a DPO margin mean of 70.1604.
Intended Use Cases
This model is particularly well-suited for applications where generating safe, ethical, and non-harmful content is paramount. Consider using this model for:
- Content Moderation: Assisting in filtering or generating responses that adhere to safety guidelines.
- Ethical AI Development: Building applications that prioritize harmlessness and user well-being.
- Dialogue Systems: Creating chatbots or conversational agents designed to avoid generating toxic or inappropriate content.
Training Details
The model was trained with a learning rate of 5e-07, a total batch size of 64, and for 1 epoch. The optimizer used was ADAMW_TORCH with cosine learning rate scheduling and a warmup ratio of 0.1.