Name: W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5, is an 8 billion parameter variant of the Llama 3 architecture. Developed by W-61, it is a fine-tuned iteration of the W-61/llama-3-8b-base-sft-hh-helpful-4xh200 model.

Key Capabilities

Preference Alignment: The model has undergone Direct Preference Optimization (DPO) using the Anthropic/hh-rlhf dataset. This training method aims to align the model's outputs more closely with human preferences, particularly for helpfulness.
Base Model Enhancement: It builds upon a supervised fine-tuned (SFT) Llama 3 base, further refining its response generation capabilities.

Training Details

The fine-tuning process involved specific hyperparameters:

Learning Rate: 5e-07
Batch Size: A train_batch_size of 8, with a total_train_batch_size of 64 due to gradient accumulation.
Optimizer: AdamW with default betas and epsilon.
Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
Epochs: Trained for 1 epoch.

Intended Use Cases

Given its DPO fine-tuning on a helpfulness dataset, this model is well-suited for applications requiring:

Generating helpful and aligned text.
Conversational AI where user preference and safety are important.
Tasks benefiting from models trained to reduce harmful or unhelpful outputs.

Overview

Model Overview

Key Capabilities

Training Details

Intended Use Cases

Full Model Card (README)