Name: W-61/llama-3-8b-base-epsilon-dpo-ultrafeedback-8xh200 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Overview

This model, W-61/llama-3-8b-base-epsilon-dpo-ultrafeedback-8xh200, is an 8 billion parameter language model developed by W-61. It is a fine-tuned variant of the W-61/llama-3-8b-base-sft-ultrachat-8xh200 model, specifically optimized using Direct Preference Optimization (DPO) on the HuggingFaceH4/ultrafeedback_binarized dataset. The DPO training aims to align the model's outputs more closely with human preferences.

Key Characteristics

Architecture: Llama 3 base model, fine-tuned.
Parameter Count: 8 billion parameters.
Context Length: 8192 tokens.
Optimization Method: Direct Preference Optimization (DPO).
Training Data: Fine-tuned on the HuggingFaceH4/ultrafeedback_binarized dataset.

Performance Metrics

During evaluation, the model achieved notable results:

Loss: 0.6085
Rewards/accuracies: 0.6905 (indicating a 69.05% accuracy in aligning with preferred responses)
Rewards/margins: 0.2488

Training Details

The model was trained with a learning rate of 5e-07, a batch size of 4 (total effective batch size of 128 across 8 GPUs), and utilized a cosine learning rate scheduler with a 0.1 warmup ratio over 1 epoch. The training process used the AdamW optimizer.

Intended Use Cases

Given its DPO fine-tuning on a preference dataset, this model is well-suited for applications where generating high-quality, human-preferred responses is critical. This includes tasks such as:

Instruction following
Dialogue systems
Content generation requiring nuanced understanding of preferences

Overview

Overview

Key Characteristics

Performance Metrics

Training Details

Intended Use Cases

Full Model Card (README)