W-61/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.6-20260430-165125
W-61/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.6-20260430-165125 is an 8 billion parameter language model, fine-tuned from jackf857/qwen3-8b-base-sft-ultrachat-4xh200-batch-128. This model was further optimized using Direct Preference Optimization (DPO) on the HuggingFaceH4/ultrafeedback_binarized dataset, enhancing its ability to align with human preferences. With a context length of 32768 tokens, it is designed for conversational AI and instruction-following tasks where human-like responses are critical.
Loading preview...
Model Overview
This model, W-61/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.6-20260430-165125, is an 8 billion parameter language model. It is a fine-tuned iteration of jackf857/qwen3-8b-base-sft-ultrachat-4xh200-batch-128, specifically optimized using Direct Preference Optimization (DPO).
Key Capabilities
- Preference Alignment: Enhanced through DPO training on the
HuggingFaceH4/ultrafeedback_binarizeddataset, suggesting improved alignment with human preferences and instruction following. - Base Architecture: Built upon a Qwen3-8B base, providing a robust foundation for various natural language processing tasks.
- Context Length: Supports a substantial context window of 32768 tokens, enabling processing of longer inputs and generating more coherent, extended responses.
Training Details
The model underwent a single epoch of training with a learning rate of 5e-07, utilizing a total batch size of 128 across 4 GPUs. The optimizer used was ADAMW_TORCH with a cosine learning rate scheduler and a 0.1 warmup ratio. This training regimen aims to refine the model's conversational abilities and response quality.
Good for
- Conversational AI: Its DPO fine-tuning makes it suitable for chatbots and interactive agents that require nuanced, human-aligned responses.
- Instruction Following: Expected to perform well in tasks where precise adherence to user instructions is crucial.
- Applications requiring longer context: The 32K context window is beneficial for summarizing long documents, extended dialogue, or complex reasoning over large texts.