Name: W-61/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4-eta-0.01 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Model Overview

This model, developed by W-61, is an 8 billion parameter language model built upon the Llama-3-8B architecture. It represents a fine-tuned version of the W-61/llama-3-8b-base-sft-hh-harmless-4xh200 model.

Key Differentiator

The primary distinction of this model lies in its training methodology. It has undergone Direct Preference Optimization (DPO) using the Anthropic/hh-rlhf dataset. This specific fine-tuning process is aimed at significantly improving the model's harmlessness and alignment with human preferences, making it suitable for applications where safety and ethical responses are paramount.

Training Details

Training involved a learning rate of 5e-07, a total batch size of 64, and utilized a cosine learning rate scheduler with 0.1 warmup ratio over 1 epoch. The optimization was performed using AdamW with specific beta and epsilon parameters. The model was trained across 4 GPUs.

Potential Use Cases

Safety-critical applications: Where generating harmless and ethically aligned content is a priority.
Content moderation: Assisting in filtering or generating safe text.
Dialogue systems: Creating chatbots or virtual assistants that prioritize non-toxic and helpful interactions.

Overview

Model Overview

Key Differentiator

Training Details

Potential Use Cases

Full Model Card (README)