Name: W-61/llama-3-8b-base-margin-dpo-4xh100-real API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Overview

W-61/llama-3-8b-base-margin-dpo-4xh100-real is an 8 billion parameter language model derived from the princeton-nlp/Llama-3-Base-8B-SFT base model. It has been fine-tuned using Direct Preference Optimization (DPO) on the HuggingFaceH4/ultrafeedback_binarized dataset, which typically involves training on human preference data to improve response quality and alignment with user instructions.

Key Characteristics

Base Model: Llama-3-Base-8B-SFT, providing a strong foundation for language understanding and generation.
Fine-tuning Method: Direct Preference Optimization (DPO), aimed at enhancing the model's ability to generate preferred responses based on human feedback.
Training Data: HuggingFaceH4/ultrafeedback_binarized dataset, a common choice for preference alignment tasks.
Context Length: Supports an 8192 token context window.

Training Details

The model was trained with a learning rate of 5e-07, a total batch size of 128 (achieved with 4 GPUs and 16 gradient accumulation steps), and for 1 epoch. The optimizer used was Adam with standard betas and epsilon, and a cosine learning rate scheduler with a 0.05 warmup ratio.

Potential Use Cases

This model is likely suitable for applications requiring high-quality, aligned text generation, such as:

Instruction following chatbots.
Content generation that adheres to specific stylistic or factual preferences.
Tasks where human-like response quality is prioritized.

Overview

Overview

Key Characteristics

Training Details

Potential Use Cases

Full Model Card (README)