Name: W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.01 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Overview

This model, developed by W-61, is an 8 billion parameter language model based on the Llama 3 architecture. It is a fine-tuned iteration of W-61/llama-3-8b-base-sft-hh-helpful-4xh200, specifically optimized using Direct Preference Optimization (DPO).

Key Characteristics

Base Model: Llama 3 8B.
Fine-tuning: Utilizes Direct Preference Optimization (DPO) for enhanced alignment.
Training Data: Fine-tuned on the Anthropic/hh-rlhf dataset, which focuses on helpfulness and harmlessness.
Context Length: Supports an 8192 token context window.

Training Details

The model was trained with the following hyperparameters:

Learning Rate: 5e-07
Batch Size: A total training batch size of 64 (8 per device across 4 GPUs with 2 gradient accumulation steps).
Optimizer: ADAMW_TORCH with standard betas and epsilon.
Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
Epochs: Trained for 1 epoch.

Intended Use Cases

While specific intended uses are not detailed in the README, the DPO fine-tuning on the Anthropic/hh-rlhf dataset suggests an optimization for generating helpful, safe, and aligned text, making it suitable for applications requiring robust conversational AI or instruction following.

Overview

Overview

Key Characteristics

Training Details

Intended Use Cases

Full Model Card (README)