Name: W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-eta-0.1-s_star-0.6-20260428-045924 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Overview

This model, developed by W-61, is an 8 billion parameter Llama 3 base model that has undergone fine-tuning using Direct Preference Optimization (DPO). The training specifically utilized the Anthropic/hh-rlhf dataset, indicating an emphasis on aligning the model's outputs with human preferences for helpfulness and safety.

Key Capabilities

Preference Alignment: Fine-tuned with DPO on the Anthropic/hh-rlhf dataset, suggesting improved helpfulness and reduced harmfulness in responses.
Llama 3 Architecture: Benefits from the foundational capabilities of the Llama 3 8B base model.
Context Handling: Supports a context length of 8192 tokens, enabling it to process and generate longer, more detailed interactions.

Training Details

The model was trained with a learning rate of 5e-07, a total batch size of 64, and utilized the AdamW optimizer. The training involved 4 GPUs with a gradient accumulation of 2 steps over 1 epoch. This configuration is designed to optimize the model's performance on preference-aligned tasks.

Good for

Developing conversational AI agents that require helpful and aligned responses.
Applications where human preference alignment is a critical factor.
Instruction-following tasks where the model needs to adhere to specific guidelines.

Overview

Overview

Key Capabilities

Training Details

Good for

Full Model Card (README)