Name: jackf857/llama-3-8b-base-margin-dpo-hh-4xh100 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jackf857

Model Overview

The jackf857/llama-3-8b-base-margin-dpo-hh-4xh100 is an 8 billion parameter language model based on the Llama 3 architecture. It is a fine-tuned variant of the W-61/llama-3-8b-base-hh-harmless-sft-4xh100 model, specifically optimized using Direct Preference Optimization (DPO).

Key Training Details

This model underwent a single epoch of training on the Anthropic/hh-rlhf dataset, which is known for its focus on human feedback for helpfulness and harmlessness. The training utilized a learning rate of 5e-07, a batch size of 4 across 4 GPUs, and a cosine learning rate scheduler with a 0.1 warmup ratio. The total effective training batch size was 128, employing gradient accumulation steps of 8.

Intended Use Cases

Given its DPO fine-tuning on a human preference dataset, this model is likely suitable for applications where alignment with human values, particularly in terms of generating helpful and harmless responses, is critical. Developers can leverage this model for tasks requiring a preference-aligned Llama 3 variant.

Overview

Model Overview

Key Training Details

Intended Use Cases

Full Model Card (README)