Name: W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.3 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.3, is an 8 billion parameter language model developed by W-61. It is a fine-tuned version of W-61/llama-3-8b-base-sft-hh-helpful-4xh200, which itself was a supervised fine-tuned (SFT) variant of the Llama 3 8B base model.

Key Capabilities

DPO Fine-tuning: The model has undergone Direct Preference Optimization (DPO) using the Anthropic/hh-rlhf dataset. This process aims to align the model's outputs more closely with human preferences for helpfulness.
Helpful Responses: Building on its SFT predecessor, this DPO-tuned model is specifically geared towards generating helpful and aligned conversational outputs.
Llama 3 Architecture: Inherits the foundational capabilities and performance characteristics of the Llama 3 8B base model.
Context Length: Supports a context window of 8192 tokens, allowing for processing and generating longer sequences of text.

Training Details

The model was trained with a learning rate of 5e-07 over 1 epoch, utilizing a total batch size of 64 across 4 GPUs. The optimizer used was AdamW with cosine learning rate scheduling and a 0.1 warmup ratio. This training regimen was designed to enhance the model's ability to provide helpful and preference-aligned responses.

Intended Use Cases

This model is particularly well-suited for applications requiring:

Helpful AI Assistants: Generating informative and user-friendly responses in conversational agents.
Preference-Aligned Generation: Tasks where the output needs to adhere to specific helpfulness criteria, as learned from the HH-RLHF dataset.
General Text Generation: Leveraging the Llama 3 8B base capabilities for a wide range of language understanding and generation tasks, with an emphasis on helpfulness.

Overview

Model Overview

Key Capabilities

Training Details

Intended Use Cases

Full Model Card (README)