Name: W-61/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-8 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, W-61/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-8, is an 8 billion parameter language model. It is a fine-tuned variant of W-61/llama-3-8b-base-sft-hh-harmless-4xh200, developed by W-61.

Key Capabilities

Fine-tuned for Harmlessness and Helpfulness: The model has undergone fine-tuning using the Anthropic/hh-rlhf dataset, which is typically used to align models with human preferences for safety and helpfulness.
Base Model Architecture: Inherits the foundational capabilities of the Llama 3 8B base model.
Context Length: Supports a context window of 8192 tokens, suitable for processing moderately long inputs and generating coherent responses.

Training Details

The model was trained with a learning rate of 5e-07, a total batch size of 64 (across 4 GPUs with 2 gradient accumulation steps), and a cosine learning rate scheduler with a 0.1 warmup ratio. The training consisted of 1 epoch. The optimizer used was AdamW with default betas and epsilon. The training utilized Transformers 4.51.0, Pytorch 2.3.1+cu121, Datasets 2.21.0, and Tokenizers 0.21.4.

Intended Use Cases

Given its fine-tuning on the Anthropic/hh-rlhf dataset, this model is likely best suited for applications where generating safe, helpful, and harmless text is a priority. This could include chatbots, content moderation, or general-purpose conversational agents that require strong alignment with ethical guidelines.

Overview

Model Overview

Key Capabilities

Training Details

Intended Use Cases

Full Model Card (README)