jackf857/llama-3-8b-base-new-dpo-harmless-s_star0.4-q_t0.4
The jackf857/llama-3-8b-base-new-dpo-harmless-s_star0.4-q_t0.4 model is an 8 billion parameter language model, fine-tuned from W-61/llama-3-8b-base-sft-hh-harmless-4xh200. This model was further optimized using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset. It is designed to generate harmless and helpful responses, making it suitable for applications requiring safe and aligned AI interactions.
Loading preview...
Model Overview
This model, jackf857/llama-3-8b-base-new-dpo-harmless-s_star0.4-q_t0.4, is an 8 billion parameter language model derived from the Llama 3 architecture. It is a fine-tuned version of W-61/llama-3-8b-base-sft-hh-harmless-4xh200, specifically optimized using Direct Preference Optimization (DPO).
Key Capabilities
- Harmlessness: The model has been fine-tuned on the Anthropic/hh-rlhf dataset, which is designed to improve alignment and reduce harmful outputs.
- Preference Alignment: Utilizes DPO for aligning model outputs with human preferences, aiming for more desirable and safer responses.
- Base Model Performance: Inherits the foundational capabilities of the Llama 3 8B base model, providing strong general language understanding and generation.
Training Details
The model underwent a single epoch of training with a learning rate of 5e-07 and a total batch size of 64 across 4 GPUs. Evaluation loss reached 0.6075, with DPO margin metrics indicating successful preference learning. The training utilized Transformers 4.51.0 and PyTorch 2.3.1+cu121.
Intended Use Cases
This model is particularly well-suited for applications where generating safe, helpful, and non-toxic content is paramount. It can be used in chatbots, content moderation, and other interactive AI systems that require a strong emphasis on harmlessness and alignment.