W-61/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.35-20260430-143919
W-61/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.35-20260430-143919 is an 8 billion parameter Qwen3-based language model fine-tuned using Direct Preference Optimization (DPO) on the HuggingFaceH4/ultrafeedback_binarized dataset. This model is optimized for generating responses aligned with human preferences, building upon a base model that was previously instruction-tuned. It is suitable for applications requiring high-quality, preference-aligned text generation.
Loading preview...
Model Overview
This model, W-61/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.35-20260430-143919, is an 8 billion parameter language model built on the Qwen3 architecture. It has been fine-tuned using Direct Preference Optimization (DPO) on the HuggingFaceH4/ultrafeedback_binarized dataset, enhancing its ability to generate human-preferred responses. The training process involved a base model, jackf857/qwen3-8b-base-sft-ultrachat-4xh200-batch-128, which was previously instruction-tuned.
Key Training Details
- Fine-tuning Method: Direct Preference Optimization (DPO)
- Dataset: HuggingFaceH4/ultrafeedback_binarized
- Base Model:
jackf857/qwen3-8b-base-sft-ultrachat-4xh200-batch-128 - Context Length: 32768 tokens
- Hyperparameters: Training utilized a learning rate of 5e-07, a total batch size of 128 across 4 GPUs, and a cosine learning rate scheduler with 0.1 warmup ratio over 1 epoch.
Performance Metrics
During evaluation, the model achieved a validation loss of 0.5890. Key DPO-specific metrics include a Fcm Dpo/beta of 0.0056 and a Margin Dpo/margin Mean of 51.3408, indicating effective preference alignment during training.
Intended Use Cases
This model is particularly well-suited for applications where generating text that aligns with human preferences is crucial. Its DPO fine-tuning makes it a strong candidate for tasks requiring nuanced and preferred responses, such as advanced chatbots, content generation, and interactive AI systems.