Name: jackf857/qwen3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260424-013732 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jackf857

Model Overview

This model, jackf857/qwen3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260424-013732, is an 8 billion parameter language model based on the Qwen3 architecture. It has been fine-tuned using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset, specifically targeting helpfulness in its responses. This DPO fine-tuning process aims to align the model's outputs with human preferences for helpfulness, building upon a previously supervised fine-tuned version.

Key Characteristics

Architecture: Qwen3-based, 8 billion parameters.
Fine-tuning: Utilizes Direct Preference Optimization (DPO) for alignment.
Dataset: Fine-tuned on the Anthropic/hh-rlhf dataset, emphasizing helpfulness.
Context Length: Supports a context window of 32,768 tokens.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-07 and a total batch size of 64 across 4 devices. The training process achieved a final loss of 0.6505 and a Beta Dpo/gap Mean of 25.7183, indicating successful preference learning. The training leveraged Transformers 4.51.0, Pytorch 2.3.1+cu121, Datasets 2.21.0, and Tokenizers 0.21.4.

Intended Use Cases

This model is suitable for applications requiring a helpful and aligned conversational AI, particularly in scenarios where generating responses that adhere to human preferences for assistance and utility is important.

Overview

Model Overview

Key Characteristics

Training Details

Intended Use Cases

Full Model Card (README)