Name: W-61/llama-3-8b-base-new-dpo-hh-harmless-s_star0.6-4xh200-batch-64-20260422-051621 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Overview

This model, llama-3-8b-base-new-dpo-hh-harmless-s_star0.6-4xh200-batch-64-20260422-051621, is an 8 billion parameter variant of the Llama 3 architecture. It has been fine-tuned using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset, which is known for its focus on helpful and harmless AI responses. The base model for this fine-tuning was W-61/llama-3-8b-base-sft-hh-harmless-4xh200, indicating a prior stage of supervised fine-tuning for similar objectives.

Key Characteristics

Architecture: Llama 3-based, 8 billion parameters.
Fine-tuning Method: Direct Preference Optimization (DPO).
Dataset: Fine-tuned on the Anthropic/hh-rlhf dataset, emphasizing helpfulness and harmlessness.
Training Details: Trained for 1 epoch with a learning rate of 5e-07, using an AdamW optimizer and a cosine learning rate scheduler. The total training batch size was 64 across 4 GPUs.

Performance Metrics

During training, the model achieved a final validation loss of 0.5422. Key DPO-specific metrics include a Fcm Dpo/beta of 0.0129 and a Margin Dpo/margin Mean of 46.9367, indicating effective preference learning towards desired response characteristics.

Intended Use Cases

This model is particularly suited for applications where generating safe, helpful, and aligned text is critical. Its DPO fine-tuning on a harmlessness-focused dataset makes it a strong candidate for:

Safe AI assistants: Developing chatbots or virtual agents that prioritize non-toxic and ethical interactions.
Content moderation: Assisting in filtering or generating content that adheres to safety guidelines.
Harmful content detection: Potentially useful in identifying and mitigating harmful outputs in other systems.

Limitations

As with any language model, it's important to note that while fine-tuned for harmlessness, continuous monitoring and evaluation are necessary to ensure its outputs remain aligned with safety standards in diverse real-world scenarios. Further information on specific limitations and broader intended uses is not detailed in the provided README.

Overview

Overview

Key Characteristics

Performance Metrics

Intended Use Cases

Limitations

Full Model Card (README)