Name: jackf857/llama-3-8b-base-slic-hf-ultrafeedback-4xh200-batch-128-20260428-054623 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jackf857

Model Overview

This model, jackf857/llama-3-8b-base-slic-hf-ultrafeedback-4xh200-batch-128-20260428-054623, is an 8 billion parameter Llama 3 base model. It has been fine-tuned by jackf857 using the HuggingFaceH4/ultrafeedback_binarized dataset, starting from the W-61/llama-3-8b-base-sft-ultrachat-8xh200 model.

Key Training Details

The model underwent a single epoch of training with a learning rate of 5e-07 and a total batch size of 128. The training process involved multi-GPU distribution across 4 devices. Evaluation metrics from the training show a final loss of 341.8599, with rewards/chosen at -260.7975 and rewards/rejected at -247.1082, indicating its performance in a reward-based learning setup. The rewards/accuracies reached 0.4935.

Frameworks Used

Training was conducted using:

Transformers 4.51.0
Pytorch 2.3.1+cu121
Datasets 2.21.0
Tokenizers 0.21.4

Potential Use Cases

Given its fine-tuning on the Ultrafeedback dataset, this model is likely suitable for tasks requiring nuanced understanding of preferences and alignment with human feedback, such as:

Reinforcement Learning from Human Feedback (RLHF) applications.
Response generation where quality is judged by a reward model.
Preference modeling and tasks involving ranking or selection based on implicit or explicit feedback.

Overview

Model Overview

Key Training Details

Frameworks Used

Potential Use Cases

Full Model Card (README)