Name: W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.48 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.48 is an 8 billion parameter language model developed by W-61. It is a fine-tuned variant of the Llama 3 8B architecture, specifically building upon the previously supervised fine-tuned model, W-61/llama-3-8b-base-sft-hh-helpful-4xh200.

Key Capabilities

Preference Alignment: This model has undergone an additional Direct Preference Optimization (DPO) phase using the Anthropic/hh-rlhf dataset. This training aims to align the model's outputs more closely with human preferences for helpfulness.
Enhanced Helpfulness: The DPO fine-tuning is specifically geared towards improving the model's ability to generate helpful and constructive responses.
Llama 3 Base: Benefits from the foundational capabilities and architecture of the Llama 3 8B base model.
Context Window: Supports a context length of 8192 tokens, allowing for processing and generating longer sequences of text.

Training Details

The model was trained with a learning rate of 5e-07, a total batch size of 64, and for 1 epoch. It utilized a cosine learning rate scheduler with a 0.1 warmup ratio. The training was performed using Transformers 4.51.0, Pytorch 2.3.1+cu121, Datasets 2.21.0, and Tokenizers 0.21.4.

Intended Use Cases

This model is particularly suitable for applications requiring:

Generating helpful and aligned text.
Tasks where human preference for response quality is critical.
Building conversational agents or assistants that prioritize helpfulness.

Overview

Model Overview

Key Capabilities

Training Details

Intended Use Cases

Full Model Card (README)