W-61/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5
W-61/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5 is an 8 billion parameter language model fine-tuned by W-61. It is based on a Llama-3 architecture and has been specifically optimized for harmlessness through DPO training on the Anthropic/hh-rlhf dataset. This model is intended for applications requiring a robust and safety-aligned conversational AI.
Loading preview...
Model Overview
This model, developed by W-61, is an 8 billion parameter language model built upon a Llama-3 base architecture. It represents a fine-tuned iteration of the W-61/llama-3-8b-base-sft-hh-harmless-4xh200 model.
Key Differentiator: Harmlessness Alignment
The primary distinction of this model lies in its training methodology. It has undergone Direct Preference Optimization (DPO) using the Anthropic/hh-rlhf dataset. This specific fine-tuning aims to enhance the model's harmlessness, making it more suitable for applications where safety and ethical responses are paramount.
Training Details
During its training, the model utilized a learning rate of 5e-07, a train_batch_size of 8, and a total_train_batch_size of 64 across 4 GPUs. The training process involved 1 epoch with a cosine learning rate scheduler and a warmup ratio of 0.1. The optimizer used was ADAMW_TORCH.
Intended Use Cases
Given its focus on harmlessness alignment, this model is particularly well-suited for:
- Safety-critical applications: Where generating non-toxic, unbiased, and ethically sound responses is crucial.
- Conversational AI: For chatbots or virtual assistants that require a strong emphasis on user safety and responsible interaction.
- Content moderation: Assisting in filtering or generating content that adheres to strict safety guidelines.
Limitations
As with any language model, users should be aware of potential limitations. While fine-tuned for harmlessness, continuous evaluation and monitoring are recommended for deployment in sensitive environments. Specific performance benchmarks and further details on intended uses and limitations are still being documented.