W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.05

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 28, 2026Architecture:Transformer Cold

W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.05 is an 8 billion parameter language model fine-tuned by W-61. It is based on a Llama 3 architecture and has been further optimized using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset. This model is designed for helpful conversational AI applications, leveraging its 8192 token context length for extended interactions.

Loading preview...

Model Overview

This model, llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.05, is an 8 billion parameter language model developed by W-61. It is a fine-tuned variant of the W-61/llama-3-8b-base-sft-hh-helpful-4xh200 base model, specifically enhanced through Direct Preference Optimization (DPO).

Key Characteristics

  • Base Architecture: Built upon the Llama 3 family of models.
  • Parameter Count: Features 8 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports an 8192 token context window, enabling the processing of longer inputs and generating more coherent, extended responses.
  • Fine-tuning Method: Utilizes Direct Preference Optimization (DPO) for alignment, trained on the Anthropic/hh-rlhf dataset. This method aims to improve the model's helpfulness and adherence to human preferences.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-07 and a total batch size of 64 across 4 GPUs. It employed the AdamW optimizer with a cosine learning rate scheduler and a 0.1 warmup ratio. The training leveraged Transformers 4.51.0, PyTorch 2.3.1+cu121, Datasets 2.21.0, and Tokenizers 0.21.4.

Intended Use Cases

Given its DPO fine-tuning on a helpfulness dataset, this model is primarily intended for applications requiring helpful and aligned conversational AI. It is suitable for tasks where generating informative, safe, and user-friendly responses is crucial.