W-61/llama-3-8b-base-margin-dpo-4xh100-real

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 28, 2026Architecture:Transformer Cold

W-61/llama-3-8b-base-margin-dpo-4xh100-real is an 8 billion parameter language model fine-tuned from princeton-nlp/Llama-3-Base-8B-SFT. This model was trained using DPO on the HuggingFaceH4/ultrafeedback_binarized dataset, indicating an optimization for instruction following and preference alignment. It is designed for general language generation tasks where a refined response based on human feedback is beneficial.

Loading preview...

Overview

W-61/llama-3-8b-base-margin-dpo-4xh100-real is an 8 billion parameter language model derived from the princeton-nlp/Llama-3-Base-8B-SFT base model. It has been fine-tuned using Direct Preference Optimization (DPO) on the HuggingFaceH4/ultrafeedback_binarized dataset, which typically involves training on human preference data to improve response quality and alignment with user instructions.

Key Characteristics

  • Base Model: Llama-3-Base-8B-SFT, providing a strong foundation for language understanding and generation.
  • Fine-tuning Method: Direct Preference Optimization (DPO), aimed at enhancing the model's ability to generate preferred responses based on human feedback.
  • Training Data: HuggingFaceH4/ultrafeedback_binarized dataset, a common choice for preference alignment tasks.
  • Context Length: Supports an 8192 token context window.

Training Details

The model was trained with a learning rate of 5e-07, a total batch size of 128 (achieved with 4 GPUs and 16 gradient accumulation steps), and for 1 epoch. The optimizer used was Adam with standard betas and epsilon, and a cosine learning rate scheduler with a 0.05 warmup ratio.

Potential Use Cases

This model is likely suitable for applications requiring high-quality, aligned text generation, such as:

  • Instruction following chatbots.
  • Content generation that adheres to specific stylistic or factual preferences.
  • Tasks where human-like response quality is prioritized.