Name: W-61/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-eta-0.1-s_star-0.6-20260428-045924 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Overview

This model, developed by W-61, is an 8 billion parameter language model based on the Llama 3 architecture. It has been fine-tuned using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset, which is known for its focus on helpful and harmless AI responses. The training process involved specific hyperparameters including a learning rate of 5e-07, a total batch size of 64, and a cosine learning rate scheduler over one epoch.

Key Capabilities

Harmless and Helpful Responses: Optimized through DPO on the Anthropic/hh-rlhf dataset to generate responses that are aligned with safety and helpfulness guidelines.
Llama 3 Base: Leverages the foundational capabilities of the Llama 3 8B base model.
Context Length: Supports an 8192 token context window, enabling more extensive and coherent conversations.

Training Details

The model was trained with a learning rate of 5e-07, a batch size of 8 per device across 4 GPUs, and a gradient accumulation of 2, resulting in an effective total training batch size of 64. The optimizer used was ADAMW_TORCH, and the learning rate schedule followed a cosine decay with a 0.1 warmup ratio over a single epoch. The training utilized Transformers 4.51.0, Pytorch 2.3.1+cu121, Datasets 2.21.0, and Tokenizers 0.21.4.

Overview

Overview

Key Capabilities

Training Details

Full Model Card (README)