jackf857/qwen3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260424-013732

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 24, 2026Architecture:Transformer0.0K Cold

The jackf857/qwen3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260424-013732 is an 8 billion parameter Qwen3-based language model, fine-tuned using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset. This model is optimized for helpfulness, building upon a supervised fine-tuned base. It features a 32K context length and is designed for general-purpose conversational AI where helpful and aligned responses are critical.

Loading preview...

Model Overview

This model, jackf857/qwen3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260424-013732, is an 8 billion parameter language model based on the Qwen3 architecture. It has been fine-tuned using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset, specifically targeting helpfulness in its responses. This DPO fine-tuning process aims to align the model's outputs with human preferences for helpfulness, building upon a previously supervised fine-tuned version.

Key Characteristics

  • Architecture: Qwen3-based, 8 billion parameters.
  • Fine-tuning: Utilizes Direct Preference Optimization (DPO) for alignment.
  • Dataset: Fine-tuned on the Anthropic/hh-rlhf dataset, emphasizing helpfulness.
  • Context Length: Supports a context window of 32,768 tokens.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-07 and a total batch size of 64 across 4 devices. The training process achieved a final loss of 0.6505 and a Beta Dpo/gap Mean of 25.7183, indicating successful preference learning. The training leveraged Transformers 4.51.0, Pytorch 2.3.1+cu121, Datasets 2.21.0, and Tokenizers 0.21.4.

Intended Use Cases

This model is suitable for applications requiring a helpful and aligned conversational AI, particularly in scenarios where generating responses that adhere to human preferences for assistance and utility is important.