W-61/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-eta-0.1-s_star-0.8-20260428-045924

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 28, 2026Architecture:Transformer Cold

W-61/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-eta-0.1-s_star-0.8-20260428-045924 is an 8 billion parameter language model, fine-tuned from W-61/llama-3-8b-base-sft-hh-harmless-4xh200. This model was specifically fine-tuned using the Anthropic/hh-rlhf dataset, indicating an optimization for harmlessness and helpfulness in conversational AI. It operates with an 8192-token context length, making it suitable for applications requiring robust, safety-aligned text generation.

Loading preview...

Model Overview

This model, llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-eta-0.1-s_star-0.8-20260428-045924, is an 8 billion parameter language model developed by W-61. It is a fine-tuned variant of the W-61/llama-3-8b-base-sft-hh-harmless-4xh200 base model, specifically optimized through training on the Anthropic/hh-rlhf dataset.

Key Capabilities

  • Harmlessness and Helpfulness: Fine-tuned on the Anthropic/hh-rlhf dataset, suggesting an emphasis on generating responses that are both safe and useful.
  • Base Model Enhancement: Builds upon an existing Llama 3 8B base model, inheriting its foundational language understanding and generation capabilities.

Training Details

The model was trained with the following key hyperparameters:

  • Learning Rate: 5e-07
  • Total Train Batch Size: 64 (across 4 devices with gradient accumulation)
  • Optimizer: ADAMW_TORCH
  • Epochs: 1

Good For

  • Applications requiring safety-aligned conversational AI.
  • Use cases where harmless and helpful text generation is critical.
  • Further research and development into DPO-based fine-tuning for ethical AI.