jackf857/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-1.0

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 25, 2026Architecture:Transformer Cold

The jackf857/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-1.0 model is an 8 billion parameter Llama 3 base model fine-tuned by jackf857. It has been optimized using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset, focusing on harmlessness and alignment. This model is designed for applications requiring a robust and safety-aligned language model, particularly in scenarios where mitigating harmful outputs is critical. It leverages a context length of 8192 tokens, making it suitable for processing moderately long inputs.

Loading preview...

Model Overview

This model, llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-1.0, is an 8 billion parameter Llama 3 base model fine-tuned by jackf857. It is a DPO-tuned variant of the W-61/llama-3-8b-base-sft-hh-harmless-4xh200 model.

Key Capabilities

  • Harmlessness Alignment: Fine-tuned specifically on the Anthropic/hh-rlhf dataset, indicating a strong focus on reducing harmful outputs and improving safety.
  • Direct Preference Optimization (DPO): Utilizes DPO for alignment, a method known for effectively incorporating human preferences into model behavior.
  • Llama 3 Architecture: Benefits from the robust and capable Llama 3 base architecture.

Training Details

The model was trained for 1 epoch with a learning rate of 5e-07 and a total batch size of 64 across 4 devices. The training process involved a cosine learning rate scheduler with a warmup ratio of 0.1. Evaluation metrics during training show a final loss of 0.5680 and a DPO beta of 0.6569, suggesting successful preference learning.

Intended Use Cases

This model is particularly well-suited for applications where safety and the generation of harmless content are paramount. It can be used in:

  • Content Moderation: Assisting in filtering or generating safe content.
  • Chatbots and Assistants: Developing conversational AI that prioritizes non-toxic and helpful responses.
  • Research in Alignment: Exploring DPO techniques and their impact on model safety.