jackf857/llama-3-8b-base-orpo-ultrafeedback-8xh200

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 14, 2026Architecture:Transformer Cold

The jackf857/llama-3-8b-base-orpo-ultrafeedback-8xh200 is an 8 billion parameter Llama 3 base model, fine-tuned using the ORPO (Odds Ratio Preference Optimization) method on the HuggingFaceH4/ultrafeedback_binarized dataset. This model is derived from W-61/llama-3-8b-base-sft-ultrachat-8xh200 and is optimized for preference alignment, demonstrating improved performance in distinguishing between chosen and rejected responses. It is suitable for tasks requiring high-quality, human-aligned text generation and conversational AI.

Loading preview...

Model Overview

This model, jackf857/llama-3-8b-base-orpo-ultrafeedback-8xh200, is an 8 billion parameter language model based on the Llama 3 architecture. It has been fine-tuned using the Odds Ratio Preference Optimization (ORPO) method, building upon the W-61/llama-3-8b-base-sft-ultrachat-8xh200 model. The fine-tuning process utilized the HuggingFaceH4/ultrafeedback_binarized dataset, aiming to align the model's outputs more closely with human preferences.

Key Characteristics

  • Base Model: Llama 3 8B parameters.
  • Fine-tuning Method: ORPO (Odds Ratio Preference Optimization).
  • Dataset: Fine-tuned on HuggingFaceH4/ultrafeedback_binarized for preference alignment.
  • Performance Metrics: Achieved a rewards accuracy of 0.6048 on the evaluation set, indicating its ability to differentiate between preferred and non-preferred responses.

Training Details

The model was trained with a learning rate of 5e-07, a total batch size of 128, and for 1 epoch. The training involved 8 GPUs with a gradient accumulation of 4 steps. The optimizer used was AdamW with cosine learning rate scheduling.

Potential Use Cases

This model is particularly well-suited for applications where generating high-quality, human-preferred text is crucial. Its ORPO fine-tuning makes it a strong candidate for:

  • Chatbots and Conversational AI: Producing more natural and helpful responses.
  • Content Generation: Creating text that aligns with specific quality or style preferences.
  • Instruction Following: Generating outputs that better adhere to given instructions and user preferences.