W-61/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.5-20260430-194457

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 30, 2026Architecture:Transformer Cold

W-61/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.5-20260430-194457 is an 8 billion parameter language model, fine-tuned from jackf857/qwen3-8b-base-sft-ultrachat-4xh200-batch-128. This model was further optimized using the HuggingFaceH4/ultrafeedback_binarized dataset, focusing on improving response quality and alignment. It features a 32768 token context length, making it suitable for tasks requiring extensive contextual understanding and refined conversational abilities.

Loading preview...

Model Overview

This model, named qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.5-20260430-194457, is an 8 billion parameter language model. It is a fine-tuned iteration of jackf857/qwen3-8b-base-sft-ultrachat-4xh200-batch-128, specifically optimized using the HuggingFaceH4/ultrafeedback_binarized dataset. This training approach aims to enhance the model's ability to generate high-quality, aligned responses, building upon its base capabilities.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-07. Key hyperparameters include:

  • Optimizer: ADAMW_TORCH with betas=(0.9, 0.999) and epsilon=1e-08.
  • Batch Size: A total training batch size of 128, achieved with a train_batch_size of 4 and gradient_accumulation_steps of 8 across 4 GPUs.
  • LR Scheduler: Cosine scheduler with a warmup ratio of 0.1.

Framework Versions

The training utilized:

  • Transformers 4.51.0
  • Pytorch 2.3.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.21.4

Potential Use Cases

Given its fine-tuning on an ultrafeedback dataset, this model is likely suitable for applications requiring improved conversational quality, instruction following, and general response refinement. Its 8 billion parameters and 32768 token context length suggest capabilities for handling complex prompts and generating coherent, extended outputs.