Name: jackf857/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jackf857

Model Overview

This model, jackf857/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85, is an 8 billion parameter Llama 3 base model. It has been fine-tuned using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset, building upon the W-61/llama-3-8b-base-sft-hh-helpful-4xh200 model.

Key Characteristics

Architecture: Llama 3 base model.
Parameter Count: 8 billion parameters.
Optimization: Fine-tuned with DPO for enhanced helpfulness and alignment.
Training Data: Utilizes the Anthropic/hh-rlhf dataset, known for preference-based learning.
Context Length: Supports an 8192-token context window.

Performance Highlights

During training, the model achieved a final validation loss of 0.5312. Key DPO metrics include a Margin Dpo/margin Mean of 149.2117 and a Logps/chosen score of -579.7042 compared to Logps/rejected of -736.6628, indicating effective preference learning.

Training Details

The training procedure involved a learning rate of 5e-07, a total batch size of 64, and 1 epoch. It leveraged a multi-GPU setup with 4 devices and AdamW_TORCH optimizer with a cosine learning rate scheduler.

Intended Use Cases

Given its DPO fine-tuning on a helpfulness dataset, this model is particularly well-suited for applications where generating helpful, aligned, and user-centric responses is critical. This includes conversational AI, chatbots, and assistants designed to provide constructive and beneficial interactions.

Overview

Model Overview

Key Characteristics

Performance Highlights

Training Details

Intended Use Cases

Full Model Card (README)