jackf857/llama-3-8b-base-cpo-ultrafeedback-8xh200

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 14, 2026Architecture:Transformer Cold

The jackf857/llama-3-8b-base-cpo-ultrafeedback-8xh200 is an 8 billion parameter language model, fine-tuned from W-61/llama-3-8b-base-sft-ultrachat-8xh200. This model was trained using a CPO (Constitutional Preference Optimization) method on the HuggingFaceH4/ultrafeedback_binarized dataset, focusing on aligning model responses with human preferences. It is designed to generate high-quality, preference-aligned text, making it suitable for applications requiring nuanced and contextually appropriate outputs.

Loading preview...

Model Overview

This model, jackf857/llama-3-8b-base-cpo-ultrafeedback-8xh200, is an 8 billion parameter language model derived from W-61/llama-3-8b-base-sft-ultrachat-8xh200. It has been further fine-tuned using a Constitutional Preference Optimization (CPO) approach on the HuggingFaceH4/ultrafeedback_binarized dataset.

Key Characteristics

  • Preference Alignment: Optimized through CPO to align with human preferences, aiming for more desirable and contextually appropriate responses.
  • Performance Metrics: Achieved a rewards accuracy of 0.625 on the evaluation set, with a chosen reward score of -36.8871 and a rejected reward score of -38.7328, indicating its ability to differentiate between preferred and non-preferred outputs.
  • Training Details: Trained for 1 epoch with a learning rate of 5e-07 and a total batch size of 128, utilizing 8 GPUs.

Potential Use Cases

This model is particularly well-suited for applications where generating text that adheres to specific preferences or quality criteria is crucial. Its CPO fine-tuning suggests strengths in:

  • Dialogue Systems: Generating more natural and preferred conversational responses.
  • Content Generation: Producing outputs that are better aligned with user expectations or ethical guidelines.
  • Instruction Following: Improving the quality and relevance of responses to complex instructions.