jackf857/llama-3-8b-base-margin-dpo-4xh100

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 2, 2026License:llama3Architecture:Transformer Cold

The jackf857/llama-3-8b-base-margin-dpo-4xh100 is an 8 billion parameter Llama 3 base model, fine-tuned using DPO on the HuggingFaceH4/ultrafeedback_binarized dataset. This model is optimized for improved response quality and alignment through direct preference optimization. It is suitable for general language understanding and generation tasks, building upon its Llama 3 architecture.

Loading preview...

Model Overview

The jackf857/llama-3-8b-base-margin-dpo-4xh100 is an 8 billion parameter language model based on the Llama 3 architecture. It is a fine-tuned variant of the W-61/llama-3-8b-base-ultrachat-sft-4xh100 model, specifically enhanced through Direct Preference Optimization (DPO).

Key Characteristics

  • Base Model: Llama 3 8B parameters.
  • Fine-tuning Method: Utilizes Direct Preference Optimization (DPO) for alignment and quality improvement.
  • Training Data: Fine-tuned on the HuggingFaceH4/ultrafeedback_binarized dataset, which is designed for preference-based learning.
  • Training Configuration: Trained with a learning rate of 5e-07, a total batch size of 128, and 1 epoch, using Adam optimizer with cosine learning rate scheduler.

Intended Use Cases

This model is designed for applications requiring a robust 8B parameter language model with enhanced response quality due to its DPO fine-tuning. It is suitable for a range of general-purpose natural language processing tasks, including text generation, summarization, and conversational AI, where aligned and preferred outputs are critical.