Name: W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.43 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.43, is an 8 billion parameter language model developed by W-61. It is a fine-tuned version of the W-61/llama-3-8b-base-sft-hh-helpful-4xh200 model, further optimized using Direct Preference Optimization (DPO).

Key Characteristics

Base Model: Built upon the Llama 3 8B architecture.
Fine-tuning: Underwent DPO training on the Anthropic/hh-rlhf dataset.
Optimization Goal: Primarily aimed at enhancing helpfulness and alignment in its responses.

Training Details

The model was trained with the following hyperparameters:

Learning Rate: 5e-07
Batch Size: 8 (train), 8 (eval) with 2 gradient accumulation steps, resulting in a total train batch size of 64.
Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08.
Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
Epochs: Trained for 1 epoch.

Intended Use

This model is suitable for applications requiring helpful and aligned text generation, leveraging its DPO fine-tuning on human preference data.

Overview

Model Overview

Key Characteristics

Training Details

Intended Use

Full Model Card (README)