jackf857/llama-3-8b-base-slic-hf-ultrafeedback-4xh200-batch-128-20260428-054623
The jackf857/llama-3-8b-base-slic-hf-ultrafeedback-4xh200-batch-128-20260428-054623 model is an 8 billion parameter Llama 3 base model fine-tuned by jackf857. It was fine-tuned on the HuggingFaceH4/ultrafeedback_binarized dataset, building upon a previously instruction-tuned Llama 3 variant. This model is optimized for performance on reward-based learning tasks, as indicated by its training on the Ultrafeedback dataset and evaluation metrics like rewards/chosen and rewards/rejected.
Loading preview...
Model Overview
This model, jackf857/llama-3-8b-base-slic-hf-ultrafeedback-4xh200-batch-128-20260428-054623, is an 8 billion parameter Llama 3 base model. It has been fine-tuned by jackf857 using the HuggingFaceH4/ultrafeedback_binarized dataset, starting from the W-61/llama-3-8b-base-sft-ultrachat-8xh200 model.
Key Training Details
The model underwent a single epoch of training with a learning rate of 5e-07 and a total batch size of 128. The training process involved multi-GPU distribution across 4 devices. Evaluation metrics from the training show a final loss of 341.8599, with rewards/chosen at -260.7975 and rewards/rejected at -247.1082, indicating its performance in a reward-based learning setup. The rewards/accuracies reached 0.4935.
Frameworks Used
Training was conducted using:
- Transformers 4.51.0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.21.4
Potential Use Cases
Given its fine-tuning on the Ultrafeedback dataset, this model is likely suitable for tasks requiring nuanced understanding of preferences and alignment with human feedback, such as:
- Reinforcement Learning from Human Feedback (RLHF) applications.
- Response generation where quality is judged by a reward model.
- Preference modeling and tasks involving ranking or selection based on implicit or explicit feedback.