W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-eta-0.1-s_star-0.6-20260428-045924

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 28, 2026Architecture:Transformer Cold

The W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-eta-0.1-s_star-0.6-20260428-045924 model is an 8 billion parameter Llama 3 base model, fine-tuned by W-61 using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset. This fine-tuning process aims to enhance helpfulness and alignment, making it suitable for conversational AI and instruction-following tasks. It leverages a context length of 8192 tokens, providing robust performance for generating coherent and contextually relevant responses.

Loading preview...

Overview

This model, developed by W-61, is an 8 billion parameter Llama 3 base model that has undergone fine-tuning using Direct Preference Optimization (DPO). The training specifically utilized the Anthropic/hh-rlhf dataset, indicating an emphasis on aligning the model's outputs with human preferences for helpfulness and safety.

Key Capabilities

  • Preference Alignment: Fine-tuned with DPO on the Anthropic/hh-rlhf dataset, suggesting improved helpfulness and reduced harmfulness in responses.
  • Llama 3 Architecture: Benefits from the foundational capabilities of the Llama 3 8B base model.
  • Context Handling: Supports a context length of 8192 tokens, enabling it to process and generate longer, more detailed interactions.

Training Details

The model was trained with a learning rate of 5e-07, a total batch size of 64, and utilized the AdamW optimizer. The training involved 4 GPUs with a gradient accumulation of 2 steps over 1 epoch. This configuration is designed to optimize the model's performance on preference-aligned tasks.

Good for

  • Developing conversational AI agents that require helpful and aligned responses.
  • Applications where human preference alignment is a critical factor.
  • Instruction-following tasks where the model needs to adhere to specific guidelines.