Name: W-61/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.43 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.43, is an 8 billion parameter language model derived from the Llama 3 architecture. It has been specifically fine-tuned using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset, which is known for its focus on helpfulness and harmlessness. The base model for this fine-tuning was W-61/llama-3-8b-base-sft-hh-harmless-4xh200.

Key Characteristics

Parameter Count: 8 billion parameters.
Context Length: Supports an 8192 token context window.
Fine-tuning Method: Utilizes Direct Preference Optimization (DPO) for alignment.
Training Data: Fine-tuned on the Anthropic/hh-rlhf dataset, emphasizing safety and helpfulness.
Training Hyperparameters: Key hyperparameters include a learning rate of 5e-07, a total training batch size of 64, and training for 1 epoch.

Intended Use Cases

This model is suitable for applications where generating harmless and aligned text is a priority. Its DPO fine-tuning on the Anthropic/hh-rlhf dataset suggests an optimization for:

Safety-critical applications: Where avoiding harmful or biased outputs is crucial.
Conversational AI: For chatbots and virtual assistants that need to maintain helpful and harmless interactions.
Content moderation: Assisting in filtering or generating content that adheres to safety guidelines.

Overview

Model Overview

Key Characteristics

Intended Use Cases

Full Model Card (README)