Name: jackf857/llama-3-8b-base-new-dpo-harmless-4xh200-s_star1.0 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jackf857

Model Overview

This model, jackf857/llama-3-8b-base-new-dpo-harmless-4xh200-s_star1.0, is an 8 billion parameter language model derived from the Llama 3 architecture. It represents a fine-tuned iteration of the W-61/llama-3-8b-base-sft-hh-harmless-4xh200 base model.

Key Capabilities

Harmlessness Alignment: The model has undergone further Direct Preference Optimization (DPO) using the Anthropic/hh-rlhf dataset, specifically targeting the reduction of harmful outputs and improving alignment with human preferences.
Performance Metrics: During training, it achieved a final validation loss of 0.5214, with specific DPO metrics indicating a margin mean of 11.8756 and a chosen log-probability of -96.2474, suggesting effective preference learning.
Training Configuration: Trained for 1 epoch with a learning rate of 5e-07, a total batch size of 64, and utilizing 4 GPUs, ensuring a focused and efficient optimization process.

Good For

Safety-Critical Applications: Ideal for use cases where generating harmless and aligned text is a primary concern, such as content moderation, safe AI assistants, or educational tools.
Further Fine-tuning: Serves as a strong, safety-aligned base model for subsequent fine-tuning on more specific, domain-adapted tasks where a foundation of harmlessness is desired.
Research in Alignment: Useful for researchers exploring DPO techniques and the impact of the Anthropic/hh-rlhf dataset on model behavior and safety.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)