W-61/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-8

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 28, 2026Architecture:Transformer Cold

W-61/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-8 is an 8 billion parameter language model developed by W-61, fine-tuned from W-61/llama-3-8b-base-sft-hh-harmless-4xh200. This model has been specifically fine-tuned using the Anthropic/hh-rlhf dataset, indicating an optimization for harmlessness and helpfulness. With a context length of 8192 tokens, it is designed for conversational AI applications requiring adherence to safety guidelines.

Loading preview...

Model Overview

This model, W-61/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-8, is an 8 billion parameter language model. It is a fine-tuned variant of W-61/llama-3-8b-base-sft-hh-harmless-4xh200, developed by W-61.

Key Capabilities

  • Fine-tuned for Harmlessness and Helpfulness: The model has undergone fine-tuning using the Anthropic/hh-rlhf dataset, which is typically used to align models with human preferences for safety and helpfulness.
  • Base Model Architecture: Inherits the foundational capabilities of the Llama 3 8B base model.
  • Context Length: Supports a context window of 8192 tokens, suitable for processing moderately long inputs and generating coherent responses.

Training Details

The model was trained with a learning rate of 5e-07, a total batch size of 64 (across 4 GPUs with 2 gradient accumulation steps), and a cosine learning rate scheduler with a 0.1 warmup ratio. The training consisted of 1 epoch. The optimizer used was AdamW with default betas and epsilon. The training utilized Transformers 4.51.0, Pytorch 2.3.1+cu121, Datasets 2.21.0, and Tokenizers 0.21.4.

Intended Use Cases

Given its fine-tuning on the Anthropic/hh-rlhf dataset, this model is likely best suited for applications where generating safe, helpful, and harmless text is a priority. This could include chatbots, content moderation, or general-purpose conversational agents that require strong alignment with ethical guidelines.