jackf857/qwen3-8b-base-beta-dpo-hh-harmless-4xh200-batch-64-20260424-025105

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 24, 2026Architecture:Transformer Cold

This is an 8 billion parameter Qwen3-based language model developed by jackf857, fine-tuned using DPO on the Anthropic/hh-rlhf dataset. It is specifically optimized for harmlessness and safety, building upon a supervised fine-tuned base model. The model has a context length of 32768 tokens and is designed for applications requiring robust, safety-aligned text generation.

Loading preview...

Model Overview

This model, jackf857/qwen3-8b-base-beta-dpo-hh-harmless-4xh200-batch-64-20260424-025105, is an 8 billion parameter Qwen3-based language model. It has been fine-tuned using Direct Preference Optimization (DPO) on the Anthropic/hh-rlhf dataset, specifically targeting harmlessness and safety. This DPO fine-tuning builds upon a previously supervised fine-tuned (SFT) base model, enhancing its ability to generate safe and non-toxic responses.

Key Characteristics

  • Base Model: Qwen3-8B architecture.
  • Fine-tuning Method: Direct Preference Optimization (DPO).
  • Dataset: Anthropic/hh-rlhf, focused on harmlessness.
  • Context Length: Supports a substantial context window of 32768 tokens.
  • Performance Metrics: Achieved a validation loss of 0.7256 and a Beta Dpo/gap Mean of 9.9202 on the evaluation set, indicating its alignment with preferred harmless responses.

Training Details

The model was trained with a learning rate of 5e-07, a total batch size of 64 (across 4 GPUs), and utilized a cosine learning rate scheduler with a 0.1 warmup ratio over 1 epoch. The training process involved 600 steps, demonstrating consistent improvement in DPO metrics.

Intended Use Cases

This model is particularly well-suited for applications where generating harmless, safe, and ethically aligned text is critical. It can be used in scenarios requiring content moderation, safe AI assistants, or any application where mitigating harmful outputs is a priority.