jackf857/llama-3-8b-base-slic-hf-ultrafeedback-4xh200

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 14, 2026Architecture:Transformer Cold

The jackf857/llama-3-8b-base-slic-hf-ultrafeedback-4xh200 is an 8 billion parameter Llama 3 base model, fine-tuned by jackf857, building upon W-61/llama-3-8b-base-sft-ultrachat-8xh200. This model was fine-tuned using the HuggingFaceH4/ultrafeedback_binarized dataset, focusing on alignment through a reward-based training approach. It is designed for general conversational AI tasks where human feedback alignment is beneficial.

Loading preview...

Model Overview

This model, jackf857/llama-3-8b-base-slic-hf-ultrafeedback-4xh200, is an 8 billion parameter variant of the Llama 3 architecture. It is a fine-tuned iteration of the W-61/llama-3-8b-base-sft-ultrachat-8xh200 model, specifically optimized using the HuggingFaceH4/ultrafeedback_binarized dataset.

Key Characteristics

  • Base Model: Llama 3 8B parameters.
  • Fine-tuning: Utilizes the ultrafeedback_binarized dataset, suggesting an emphasis on aligning model responses with human preferences.
  • Training Objective: The training process involved metrics like Rewards/chosen, Rewards/rejected, and Slic/rank Loss, indicating a focus on reinforcement learning from human feedback (RLHF) or similar alignment techniques.
  • Performance Metrics: Achieved a Rewards/accuracies of 0.4919 on the evaluation set, with a final loss of 341.9101.

Training Details

The model was trained with a learning rate of 5e-07 over 1 epoch, using a total batch size of 128 across 8 GPUs. The optimizer used was ADAMW_TORCH with a cosine learning rate scheduler.

Intended Use Cases

This model is suitable for applications requiring a conversational AI that has been aligned with human feedback, potentially leading to more helpful and harmless outputs. Its fine-tuning on an ultrafeedback dataset suggests its strength lies in generating responses that are preferred by humans in a comparative setting.