W-61/llama-3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260417-222337

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 18, 2026Architecture:Transformer Cold

W-61/llama-3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260417-222337 is an 8 billion parameter language model, fine-tuned from llama-3-8b-base-sft-hh-harmless-4xh200-batch-64 using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset. This model is specifically optimized for harmlessness and aligns with human preferences regarding safety and helpfulness. It processes inputs up to 8192 tokens and is intended for applications requiring robust safety and reduced harmful outputs.

Loading preview...

Model Overview

W-61/llama-3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260417-222337 is an 8 billion parameter language model derived from the Llama 3 family. It has been fine-tuned using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset, building upon a base model that was already aligned for harmlessness. This DPO fine-tuning process aims to further enhance the model's ability to generate responses that are safe and aligned with human preferences, specifically targeting the reduction of harmful outputs.

Key Capabilities

  • Enhanced Harmlessness: Optimized through DPO on a dataset focused on human feedback for harmlessness.
  • Preference Alignment: Designed to generate responses that are more aligned with desired human preferences, particularly in safety.
  • Base Llama 3 Architecture: Benefits from the foundational capabilities of the Llama 3 8B base model.

Training Details

The model was trained with a learning rate of 5e-07, a total batch size of 64, and for 1 epoch. Evaluation metrics show a final loss of 0.5256, with specific DPO margin metrics indicating the effectiveness of the preference alignment process. The training utilized Transformers 4.51.0 and Pytorch 2.3.1+cu121.