Name: W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.05 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.05, is an 8 billion parameter language model developed by W-61. It is a fine-tuned variant of the W-61/llama-3-8b-base-sft-hh-helpful-4xh200 base model, specifically enhanced through Direct Preference Optimization (DPO).

Key Characteristics

Base Architecture: Built upon the Llama 3 family of models.
Parameter Count: Features 8 billion parameters, offering a balance between performance and computational efficiency.
Context Length: Supports an 8192 token context window, enabling the processing of longer inputs and generating more coherent, extended responses.
Fine-tuning Method: Utilizes Direct Preference Optimization (DPO) for alignment, trained on the Anthropic/hh-rlhf dataset. This method aims to improve the model's helpfulness and adherence to human preferences.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-07 and a total batch size of 64 across 4 GPUs. It employed the AdamW optimizer with a cosine learning rate scheduler and a 0.1 warmup ratio. The training leveraged Transformers 4.51.0, PyTorch 2.3.1+cu121, Datasets 2.21.0, and Tokenizers 0.21.4.

Intended Use Cases

Given its DPO fine-tuning on a helpfulness dataset, this model is primarily intended for applications requiring helpful and aligned conversational AI. It is suitable for tasks where generating informative, safe, and user-friendly responses is crucial.

Overview

Model Overview

Key Characteristics

Training Details

Intended Use Cases

Full Model Card (README)