W-61/llama-3-8b-base-beta-dpo-hh-harmless-8xh200

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 11, 2026Architecture:Transformer Cold

W-61/llama-3-8b-base-beta-dpo-hh-harmless-8xh200 is an 8 billion parameter Llama 3 base model fine-tuned by W-61 using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset. This model is specifically optimized for harmlessness, aiming to reduce undesirable outputs. It is suitable for applications requiring a robust, safety-aligned language model.

Loading preview...

Model Overview

This model, llama-3-8b-base-beta-dpo-hh-harmless-8xh200, is an 8 billion parameter variant of the Llama 3 base architecture. It has been fine-tuned by W-61 using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset, which is designed to align models with human preferences for helpfulness and harmlessness. The fine-tuning process specifically focused on enhancing the model's harmlessness.

Key Characteristics

  • Base Model: Llama 3 8B base.
  • Fine-tuning Method: Direct Preference Optimization (DPO).
  • Training Data: Anthropic/hh-rlhf dataset, emphasizing harmlessness.
  • Training Results: Achieved a final loss of 0.5633 on the evaluation set, with a Beta Dpo/gap Mean of 8.8052.
  • Context Length: Supports an 8192-token context window.

Intended Use Cases

This model is particularly well-suited for applications where safety and the generation of harmless content are critical. Developers can leverage this model for:

  • Content Moderation: Assisting in filtering or generating safe content.
  • Chatbots and Assistants: Deploying conversational AI that prioritizes non-toxic and harmless responses.
  • Research: Exploring the effects of DPO fine-tuning on safety alignment in large language models.