Name: W-61/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, W-61/llama-3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260417-230753, is an 8 billion parameter language model. It is a fine-tuned variant of llama-3-8b-base-sft-hh-helpful-4xh200-batch-64, specifically optimized using the Anthropic/hh-rlhf dataset.

Key Characteristics

Base Model: Fine-tuned from a Llama 3 8B base model.
Training Method: Utilizes Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset, aiming to align model outputs with human preferences for helpfulness.
Parameter Count: 8 billion parameters, offering a balance between performance and computational efficiency.
Context Length: Supports an 8192-token context window.

Training Details

The model underwent 1 epoch of training with a learning rate of 5e-07 and a total batch size of 64. Evaluation metrics show a final loss of 1.7101 and a Beta DPO loss margin mean of 86.8606, indicating its performance in aligning with preferred responses.

Intended Use Cases

This model is suitable for applications requiring helpful and aligned text generation, conversational AI, and tasks where human feedback-driven optimization is beneficial. Its DPO training on the Anthropic/hh-rlhf dataset suggests a strong capability in generating responses that are perceived as helpful and harmless.

Overview

Model Overview

Key Characteristics

Training Details

Intended Use Cases

Full Model Card (README)