W-61/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.4
W-61/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.4 is an 8 billion parameter Llama 3 base model fine-tuned by W-61. This model has been specifically optimized using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset, focusing on generating harmless and helpful responses. It is designed for applications requiring robust safety and alignment, particularly in conversational AI where mitigating harmful outputs is critical.
Loading preview...
Model Overview
W-61/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.4 is an 8 billion parameter language model, fine-tuned from the W-61/llama-3-8b-base-sft-hh-harmless-4xh200 base model. This iteration leverages Direct Preference Optimization (DPO) on the comprehensive Anthropic/hh-rlhf dataset, which is known for its focus on human feedback for helpfulness and harmlessness.
Key Characteristics
- Base Model: Llama 3 8B architecture.
- Fine-tuning Method: Utilizes Direct Preference Optimization (DPO) for alignment.
- Training Data: Optimized on the Anthropic/hh-rlhf dataset, emphasizing safety and helpfulness.
- Context Length: Supports a context window of 8192 tokens.
- Training Hyperparameters: Trained with a learning rate of 5e-07, a total batch size of 64, and a cosine learning rate scheduler over 1 epoch.
Intended Use Cases
This model is particularly well-suited for applications where generating safe, harmless, and helpful text is paramount. Its DPO fine-tuning on the hh-rlhf dataset makes it a strong candidate for:
- Safe Conversational AI: Developing chatbots and virtual assistants that prioritize non-toxic and helpful interactions.
- Content Moderation: Assisting in filtering or generating content that adheres to safety guidelines.
- Aligned Text Generation: Creating text for sensitive applications where ethical considerations are critical.