Name: W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-8 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-8, is an 8 billion parameter language model developed by W-61. It is a fine-tuned variant of the W-61/llama-3-8b-base-sft-hh-helpful-4xh200 base model.

Key Characteristics

Base Model: Llama 3 8B architecture.
Fine-tuning Method: Utilizes Direct Preference Optimization (DPO).
Training Data: Fine-tuned on the Anthropic/hh-rlhf dataset, which is designed to improve helpfulness and reduce harmfulness through human feedback.
Context Length: Supports a context window of 8192 tokens.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-07 and a total batch size of 64 across 4 GPUs. The optimizer used was AdamW with cosine learning rate scheduling and a warmup ratio of 0.1.

Intended Use Cases

This model is primarily intended for applications where generating helpful, harmless, and aligned responses is crucial. Its DPO fine-tuning on a human feedback dataset suggests suitability for:

Chatbots and conversational AI systems.
Assistants requiring helpful and preference-aligned outputs.
Tasks benefiting from improved instruction following and safety.

Overview

Model Overview

Key Characteristics

Training Details

Intended Use Cases

Full Model Card (README)