Name: W-61/llama-3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260417-222337 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

W-61/llama-3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260417-222337 is an 8 billion parameter language model derived from the Llama 3 family. It has been fine-tuned using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset, building upon a base model that was already aligned for harmlessness. This DPO fine-tuning process aims to further enhance the model's ability to generate responses that are safe and aligned with human preferences, specifically targeting the reduction of harmful outputs.

Key Capabilities

Enhanced Harmlessness: Optimized through DPO on a dataset focused on human feedback for harmlessness.
Preference Alignment: Designed to generate responses that are more aligned with desired human preferences, particularly in safety.
Base Llama 3 Architecture: Benefits from the foundational capabilities of the Llama 3 8B base model.

Training Details

The model was trained with a learning rate of 5e-07, a total batch size of 64, and for 1 epoch. Evaluation metrics show a final loss of 0.5256, with specific DPO margin metrics indicating the effectiveness of the preference alignment process. The training utilized Transformers 4.51.0 and Pytorch 2.3.1+cu121.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)