W-61/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.5-20260430-194457
W-61/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.5-20260430-194457 is an 8 billion parameter language model, fine-tuned from jackf857/qwen3-8b-base-sft-ultrachat-4xh200-batch-128. This model was further optimized using the HuggingFaceH4/ultrafeedback_binarized dataset, focusing on improving response quality and alignment. It features a 32768 token context length, making it suitable for tasks requiring extensive contextual understanding and refined conversational abilities.
Loading preview...
Model Overview
This model, named qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.5-20260430-194457, is an 8 billion parameter language model. It is a fine-tuned iteration of jackf857/qwen3-8b-base-sft-ultrachat-4xh200-batch-128, specifically optimized using the HuggingFaceH4/ultrafeedback_binarized dataset. This training approach aims to enhance the model's ability to generate high-quality, aligned responses, building upon its base capabilities.
Training Details
The model underwent a single epoch of training with a learning rate of 5e-07. Key hyperparameters include:
- Optimizer: ADAMW_TORCH with betas=(0.9, 0.999) and epsilon=1e-08.
- Batch Size: A total training batch size of 128, achieved with a
train_batch_sizeof 4 andgradient_accumulation_stepsof 8 across 4 GPUs. - LR Scheduler: Cosine scheduler with a warmup ratio of 0.1.
Framework Versions
The training utilized:
- Transformers 4.51.0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.21.4
Potential Use Cases
Given its fine-tuning on an ultrafeedback dataset, this model is likely suitable for applications requiring improved conversational quality, instruction following, and general response refinement. Its 8 billion parameters and 32768 token context length suggest capabilities for handling complex prompts and generating coherent, extended outputs.