Name: W-61/llama-3-8b-base-beta-dpo-hh-helpful-8xh200 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, W-61/llama-3-8b-base-beta-dpo-hh-helpful-8xh200, is an 8 billion parameter language model developed by W-61. It is a fine-tuned iteration of the W-61/llama-3-8b-base-sft-hh-helpful-8xh200 model, specifically optimized using Direct Preference Optimization (DPO).

Key Characteristics

Base Model: Fine-tuned from a Llama 3 8B base model.
Fine-tuning: Utilizes Direct Preference Optimization (DPO) for alignment.
Dataset: Trained on the Anthropic/hh-rlhf dataset, indicating a focus on helpfulness and harmlessness.
Context Length: Supports a context window of 8192 tokens.

Training Details

The model was trained with a learning rate of 5e-07 over 1 epoch, using a total batch size of 128 across 8 GPUs. Evaluation metrics show a final loss of 0.6427 and a Beta Dpo/gap Mean of 20.0887, suggesting improved alignment during the DPO phase.

Potential Use Cases

Given its DPO fine-tuning on a helpfulness dataset, this model is likely suitable for applications requiring:

Helpful and aligned responses: Generating user-friendly and constructive text.
General-purpose conversational AI: Where safety and helpfulness are priorities.
Further research into DPO and alignment techniques: As a base for experimental work.

Overview

Model Overview

Key Characteristics

Training Details

Potential Use Cases

Full Model Card (README)