jackf857/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.85

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 25, 2026Architecture:Transformer Cold

The jackf857/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.85 model is an 8 billion parameter Llama 3-based language model, fine-tuned by jackf857. It is specifically optimized using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset to enhance harmlessness and align with human preferences. This model is primarily intended for applications requiring a robust, preference-aligned LLM with a focus on generating safe and helpful responses, making it suitable for conversational AI and content moderation tasks.

Loading preview...

Model Overview

This model, llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.85, is an 8 billion parameter Llama 3-based language model. It has been fine-tuned by jackf857 using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset, building upon the W-61/llama-3-8b-base-sft-hh-harmless-4xh200 base model. The primary goal of this fine-tuning was to improve the model's harmlessness and alignment with human preferences, as indicated by its training on a dataset focused on helpful and harmless responses.

Key Capabilities

  • Preference Alignment: Optimized using DPO to align with human preferences, particularly for harmlessness.
  • Safety-Focused Generation: Designed to produce responses that are helpful and avoid harmful content.
  • Llama 3 Architecture: Benefits from the foundational capabilities of the Llama 3 8B base model.

Good For

  • Conversational AI: Developing chatbots or virtual assistants where safety and harmlessness are critical.
  • Content Moderation: Assisting in filtering or generating content that adheres to safety guidelines.
  • Research in Alignment: Exploring DPO techniques for improving LLM behavior and safety.

Training Details

The model was trained with a learning rate of 5e-07, a total batch size of 64, and for 1 epoch. Evaluation metrics show a final validation loss of 0.5392, with specific DPO metrics like Margin Dpo/margin Mean at 4.1973, indicating effective preference learning.