Name: jackf857/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jackf857

Model Overview

This model, developed by jackf857, is an 8 billion parameter language model built upon the Llama 3 base architecture. It is a fine-tuned version of the W-61/llama-3-8b-base-sft-hh-helpful-4xh200 model, specifically enhanced through Direct Preference Optimization (DPO).

Key Capabilities

Helpful and Harmless Responses: The model has been fine-tuned using the Anthropic/hh-rlhf dataset, which focuses on aligning AI models with human preferences for helpfulness and harmlessness.
DPO Training: Utilizes Direct Preference Optimization, a method designed to improve model alignment and response quality by learning from human feedback data.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-07 and a total batch size of 64 across 4 GPUs. Evaluation metrics during training showed a final loss of 0.6083, with specific DPO margin and log-probability scores indicating its performance on chosen versus rejected responses from the preference dataset.

Intended Use Cases

This model is well-suited for applications requiring conversational AI that prioritizes generating responses that are both helpful and safe, such as chatbots, virtual assistants, and content generation where ethical considerations are paramount.

Overview

Model Overview

Key Capabilities

Training Details

Intended Use Cases

Full Model Card (README)