Name: W-61/llama-3-8b-base-margin-dpo-hh-helpful-8xh200 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Overview

This model, W-61/llama-3-8b-base-margin-dpo-hh-helpful-8xh200, is an 8 billion parameter language model derived from the Llama 3 architecture. It has been fine-tuned using a Direct Preference Optimization (DPO) method on the Anthropic/hh-rlhf dataset, specifically building upon the W-61/llama-3-8b-base-sft-hh-helpful-8xh200 base model.

Key Capabilities

Helpful Response Generation: Optimized through DPO on human feedback data to produce more helpful and aligned outputs.
Llama 3 Architecture: Benefits from the foundational capabilities of the Llama 3 8B model.
Context Length: Supports an 8192-token context window, allowing for processing and generating longer sequences of text.

Training Details

The model was trained with a learning rate of 5e-07, a batch size of 16 across 8 GPUs (total batch size 128), and a cosine learning rate scheduler with a 0.1 warmup ratio over 1 epoch. Evaluation metrics show a final loss of 0.4588 and a margin Dpo/margin Mean of 11.1187, indicating successful preference alignment during training.

Good For

Applications requiring models that generate helpful and user-aligned responses.
Tasks where DPO-based fine-tuning is beneficial for improving model behavior based on human preferences.
Research and development in preference-based model alignment and helpfulness.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)