jackf857/llama-3-8b-base-slic-hf-ultrafeedback-4xh200-batch-128-20260428-054623

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 28, 2026Architecture:Transformer Cold

The jackf857/llama-3-8b-base-slic-hf-ultrafeedback-4xh200-batch-128-20260428-054623 model is an 8 billion parameter Llama 3 base model fine-tuned by jackf857. It was fine-tuned on the HuggingFaceH4/ultrafeedback_binarized dataset, building upon a previously instruction-tuned Llama 3 variant. This model is optimized for performance on reward-based learning tasks, as indicated by its training on the Ultrafeedback dataset and evaluation metrics like rewards/chosen and rewards/rejected.

Loading preview...

Model Overview

This model, jackf857/llama-3-8b-base-slic-hf-ultrafeedback-4xh200-batch-128-20260428-054623, is an 8 billion parameter Llama 3 base model. It has been fine-tuned by jackf857 using the HuggingFaceH4/ultrafeedback_binarized dataset, starting from the W-61/llama-3-8b-base-sft-ultrachat-8xh200 model.

Key Training Details

The model underwent a single epoch of training with a learning rate of 5e-07 and a total batch size of 128. The training process involved multi-GPU distribution across 4 devices. Evaluation metrics from the training show a final loss of 341.8599, with rewards/chosen at -260.7975 and rewards/rejected at -247.1082, indicating its performance in a reward-based learning setup. The rewards/accuracies reached 0.4935.

Frameworks Used

Training was conducted using:

  • Transformers 4.51.0
  • Pytorch 2.3.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.21.4

Potential Use Cases

Given its fine-tuning on the Ultrafeedback dataset, this model is likely suitable for tasks requiring nuanced understanding of preferences and alignment with human feedback, such as:

  • Reinforcement Learning from Human Feedback (RLHF) applications.
  • Response generation where quality is judged by a reward model.
  • Preference modeling and tasks involving ranking or selection based on implicit or explicit feedback.