Name: jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jackf857

Model Overview

This model, developed by jackf857, is an 8 billion parameter language model fine-tuned using Direct Preference Optimization (DPO). It builds upon a Qwen3-8B-base model that was initially instruction-tuned for harmlessness (jackf857/qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452). The DPO fine-tuning process specifically utilized the Anthropic/hh-rlhf dataset to further enhance its alignment with human preferences, particularly focusing on generating harmless responses.

Key Characteristics

Base Model: Qwen3-8B-base architecture.
Fine-tuning Method: Direct Preference Optimization (DPO).
Dataset: Anthropic/hh-rlhf, emphasizing harmlessness.
Parameter Count: 8 billion parameters.
Context Length: 32768 tokens.

Training Details

The model underwent a single epoch of DPO training with a learning rate of 5e-07, a total batch size of 64, and a cosine learning rate scheduler. Evaluation metrics, including DPO loss and margin, indicate the model's performance in aligning with preferred responses over rejected ones.

Intended Use Cases

This model is particularly suited for applications where generating safe, non-toxic, and preference-aligned text is crucial. Its DPO fine-tuning on a harmlessness dataset makes it a strong candidate for conversational AI, content moderation, and other scenarios requiring robust safety guardrails.

Overview

Model Overview

Key Characteristics

Training Details

Intended Use Cases

Full Model Card (README)