jackf857/llama-3-8b-base-ipo-ultrafeedback-4xh200-batch-128-20260428-004616
The jackf857/llama-3-8b-base-ipo-ultrafeedback-4xh200-batch-128-20260428-004616 model is an 8 billion parameter Llama 3 base model fine-tuned by jackf857. It was fine-tuned on the HuggingFaceH4/ultrafeedback_binarized dataset, focusing on improving response quality through preference learning. This model is designed for general-purpose conversational AI and instruction following, demonstrating improved alignment over its base model.
Loading preview...
Model Overview
This model, jackf857/llama-3-8b-base-ipo-ultrafeedback-4xh200-batch-128-20260428-004616, is an 8 billion parameter language model based on the Llama 3 architecture. It is a fine-tuned iteration of the W-61/llama-3-8b-base-sft-ultrachat-8xh200 model, specifically optimized using the HuggingFaceH4/ultrafeedback_binarized dataset. The fine-tuning process aimed to enhance the model's ability to generate preferred responses, as indicated by its training on a preference dataset.
Key Capabilities
- Preference Alignment: Fine-tuned with the Ultrafeedback dataset, suggesting improved alignment with human preferences for response quality.
- Instruction Following: As a fine-tuned model, it is intended for general instruction-following tasks.
- Base Model Performance: Builds upon the Llama 3 8B base model, inheriting its foundational language understanding and generation capabilities.
Training Details
The model was trained for 1 epoch with a learning rate of 5e-07, using a total batch size of 128 across 4 GPUs. Evaluation metrics during training showed a rewards accuracy of 0.6800, indicating its ability to differentiate between chosen and rejected responses in the preference dataset. The training utilized Transformers 4.51.0 and Pytorch 2.3.1+cu121.