Name: W-61/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.48 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Overview

This model, W-61/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.48, is an 8 billion parameter language model developed by W-61. It is a fine-tuned version of the W-61/llama-3-8b-base-sft-hh-harmless-4xh200 model, specifically optimized using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset.

Key Characteristics

Base Model: Llama 3 8B architecture.
Fine-tuning: Utilizes Direct Preference Optimization (DPO) for alignment.
Safety Alignment: Trained on the Anthropic/hh-rlhf dataset, focusing on harmlessness.

Training Details

The model was trained with a learning rate of 5e-07, a total batch size of 64, and for 1 epoch. It leveraged a multi-GPU setup with 4 devices and used the AdamW_TORCH optimizer with a cosine learning rate scheduler.

Intended Use Cases

This model is suitable for applications where safety and harmlessness are critical, particularly in conversational AI or content generation tasks that require adherence to ethical guidelines.

Overview

Overview

Key Characteristics

Training Details

Intended Use Cases

Full Model Card (README)