W-61/llama-3-8b-base-margin-dpo-hh-harmless-8xh200

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 11, 2026Architecture:Transformer Cold

W-61/llama-3-8b-base-margin-dpo-hh-harmless-8xh200 is an 8 billion parameter Llama 3-based language model fine-tuned by W-61. This model is specifically fine-tuned using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset to enhance harmlessness. It is designed for applications requiring a robust, safety-aligned conversational AI.

Loading preview...

Model Overview

This model, llama-3-8b-base-margin-dpo-hh-harmless-8xh200, is an 8 billion parameter variant of the Llama 3 architecture, developed by W-61. It has been fine-tuned using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset, building upon the base model W-61/llama-3-8b-base-sft-hh-harmless-8xh200.

Key Characteristics

  • Architecture: Llama 3 8B parameters.
  • Fine-tuning Method: Direct Preference Optimization (DPO).
  • Dataset: Anthropic/hh-rlhf, focusing on harmlessness.
  • Context Length: Supports an 8192-token context window.
  • Performance: Achieved a validation loss of 0.5388 and a margin DPO mean of 7.1205 on the evaluation set, indicating improved alignment towards preferred, harmless responses.

Intended Use Cases

This model is particularly suited for applications where generating harmless and safety-aligned text is critical. Its DPO fine-tuning on a human feedback dataset suggests improved performance in avoiding undesirable or harmful outputs, making it a strong candidate for:

  • Safety-critical AI assistants
  • Content moderation tools
  • Conversational agents requiring strong ethical guidelines