jackf857/llama-3-8b-base-ipo-ultrafeedback-4xh200-batch-128-20260428-004616

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 28, 2026Architecture:Transformer Cold

The jackf857/llama-3-8b-base-ipo-ultrafeedback-4xh200-batch-128-20260428-004616 model is an 8 billion parameter Llama 3 base model fine-tuned by jackf857. It was fine-tuned on the HuggingFaceH4/ultrafeedback_binarized dataset, focusing on improving response quality through preference learning. This model is designed for general-purpose conversational AI and instruction following, demonstrating improved alignment over its base model.

Loading preview...

Model Overview

This model, jackf857/llama-3-8b-base-ipo-ultrafeedback-4xh200-batch-128-20260428-004616, is an 8 billion parameter language model based on the Llama 3 architecture. It is a fine-tuned iteration of the W-61/llama-3-8b-base-sft-ultrachat-8xh200 model, specifically optimized using the HuggingFaceH4/ultrafeedback_binarized dataset. The fine-tuning process aimed to enhance the model's ability to generate preferred responses, as indicated by its training on a preference dataset.

Key Capabilities

  • Preference Alignment: Fine-tuned with the Ultrafeedback dataset, suggesting improved alignment with human preferences for response quality.
  • Instruction Following: As a fine-tuned model, it is intended for general instruction-following tasks.
  • Base Model Performance: Builds upon the Llama 3 8B base model, inheriting its foundational language understanding and generation capabilities.

Training Details

The model was trained for 1 epoch with a learning rate of 5e-07, using a total batch size of 128 across 4 GPUs. Evaluation metrics during training showed a rewards accuracy of 0.6800, indicating its ability to differentiate between chosen and rejected responses in the preference dataset. The training utilized Transformers 4.51.0 and Pytorch 2.3.1+cu121.