Model Overview

This model, named qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.5-20260430-194457, is an 8 billion parameter language model. It is a fine-tuned iteration of jackf857/qwen3-8b-base-sft-ultrachat-4xh200-batch-128, specifically optimized using the HuggingFaceH4/ultrafeedback_binarized dataset. This training approach aims to enhance the model's ability to generate high-quality, aligned responses, building upon its base capabilities.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-07. Key hyperparameters include:

Optimizer: ADAMW_TORCH with betas=(0.9, 0.999) and epsilon=1e-08.
Batch Size: A total training batch size of 128, achieved with a train_batch_size of 4 and gradient_accumulation_steps of 8 across 4 GPUs.
LR Scheduler: Cosine scheduler with a warmup ratio of 0.1.

Framework Versions

The training utilized:

Transformers 4.51.0
Pytorch 2.3.1+cu121
Datasets 2.21.0
Tokenizers 0.21.4

Potential Use Cases

Given its fine-tuning on an ultrafeedback dataset, this model is likely suitable for applications requiring improved conversational quality, instruction following, and general response refinement. Its 8 billion parameters and 32768 token context length suggest capabilities for handling complex prompts and generating coherent, extended outputs.

Overview

Model Overview

Training Details

Framework Versions

Potential Use Cases

Full Model Card (README)