Name: jackf857/llama-3-8b-base-new-dpo-hh-helpful-s_star0.85-4xh200-batch-64-20260421-233802 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jackf857

Model Overview

This model, developed by jackf857, is an 8 billion parameter language model based on the Llama 3 architecture. It is a fine-tuned variant of the W-61/llama-3-8b-base-sft-hh-helpful-4xh200 model, specifically optimized using Direct Preference Optimization (DPO).

Key Characteristics

Base Model: Llama 3 8B.
Fine-tuning: Utilizes Direct Preference Optimization (DPO) for alignment.
Dataset: Trained on the Anthropic/hh-rlhf dataset, which focuses on human feedback for helpfulness and harmlessness.
Context Length: Supports a context window of 8192 tokens.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-07 and a total batch size of 64 across 4 GPUs. Evaluation metrics show a final loss of 0.5131, with DPO-specific metrics indicating preference alignment, such as a margin DPO mean of 139.8486. This fine-tuning process aims to improve the model's ability to generate responses that are perceived as more helpful and aligned with human instructions.

Intended Use Cases

This model is suitable for applications requiring a helpful and preference-aligned language model, particularly in conversational AI, instruction following, and general text generation where human-like helpfulness is desired.

Overview

Model Overview

Key Characteristics

Training Details

Intended Use Cases

Full Model Card (README)