Name: jackf857/llama-3-8b-base-r-dpo-ultrafeedback-4xh200 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jackf857

Overview

This model, jackf857/llama-3-8b-base-r-dpo-ultrafeedback-4xh200, is an 8 billion parameter language model derived from the Llama 3 architecture. It has been fine-tuned using Direct Preference Optimization (DPO) on the HuggingFaceH4/ultrafeedback_binarized dataset, building upon the W-61/llama-3-8b-base-sft-ultrachat-8xh200 base model.

Key Capabilities

Preference Alignment: Optimized through DPO to generate responses that align with human preferences, as indicated by the ultrafeedback dataset.
Conversational Generation: Inherits and refines capabilities for generating coherent and contextually relevant conversational outputs.
Base Model Enhancement: Improves upon a supervised fine-tuned (SFT) Llama 3 base model, focusing on refining response quality.

Training Details

The model was trained with a learning rate of 5e-07 over 1 epoch, utilizing a total batch size of 128 across 4 GPUs. The training process achieved a final validation loss of 0.5080. Key DPO metrics, such as R Dpo/chosen Len (291.2620) and R Dpo/rejected Len (248.3960), indicate the model's preference for longer, more detailed chosen responses over rejected ones.

Intended Use Cases

This model is suitable for applications requiring high-quality, preference-aligned text generation, particularly in conversational AI, chatbots, and interactive systems where response quality and human-like interaction are crucial. Its DPO fine-tuning makes it well-suited for tasks where distinguishing between good and bad responses is important.

Overview

Overview

Key Capabilities

Training Details

Intended Use Cases

Full Model Card (README)