jackf857/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 25, 2026Architecture:Transformer Cold

This model is an 8 billion parameter Llama 3 base model, fine-tuned by jackf857 using DPO on the Anthropic/hh-rlhf dataset. It is specifically optimized for harmlessness and alignment, building upon a pre-trained SFT version. The model is designed for applications requiring robust safety and reduced harmful outputs, making it suitable for general-purpose conversational AI where ethical considerations are paramount.

Loading preview...

Model Overview

This model, llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6, is an 8 billion parameter Llama 3 base model. It has been fine-tuned using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset, which is known for its focus on harmlessness and helpfulness. This DPO fine-tuning builds upon an initial Supervised Fine-Tuning (SFT) phase from the W-61/llama-3-8b-base-sft-hh-harmless-4xh200 model.

Key Characteristics

  • Base Model: Llama 3 8B parameters.
  • Fine-tuning Method: Direct Preference Optimization (DPO).
  • Dataset: Anthropic/hh-rlhf, emphasizing harmlessness.
  • Context Length: 8192 tokens.
  • Training Details: Trained with a learning rate of 5e-07, a total batch size of 64, and a cosine learning rate scheduler over 1 epoch.

Performance Metrics

During training, the model achieved a final validation loss of 0.5318. Notable DPO-specific metrics include a Margin Dpo/margin Mean of 34.3262 and Logps/chosen of -139.1072, indicating successful preference learning towards chosen (harmless) responses over rejected ones.

Intended Use Cases

This model is particularly well-suited for applications where generating harmless and aligned text is critical. It can be used for:

  • Safe conversational AI: Developing chatbots or virtual assistants that prioritize ethical and non-toxic responses.
  • Content moderation assistance: Helping to filter or flag potentially harmful content.
  • Research into AI safety and alignment: Providing a base for further experimentation in reducing undesirable model behaviors.