W-61/qwen3-8b-base-ipo-ultrafeedback-4xh200-batch-128-20260422-131855

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 23, 2026Architecture:Transformer Cold

W-61/qwen3-8b-base-ipo-ultrafeedback-4xh200-batch-128-20260422-131855 is an 8 billion parameter language model developed by W-61, fine-tuned from a Qwen3-8B base model. This iteration is specifically optimized using the Ultrafeedback dataset, focusing on improving alignment and response quality through an IPO (Identity Preference Optimization) training procedure. It is designed for applications requiring refined conversational abilities and adherence to user preferences, building upon its base model's capabilities.

Loading preview...

Model Overview

This model, qwen3-8b-base-ipo-ultrafeedback-4xh200-batch-128-20260422-131855, is an 8 billion parameter language model developed by W-61. It is a fine-tuned version of the W-61/qwen3-8b-base-sft-ultrachat-4xh200-batch-128 base model, specifically optimized using the HuggingFaceH4/ultrafeedback_binarized dataset.

Key Capabilities

  • Preference Alignment: Fine-tuned with the Ultrafeedback dataset, indicating an emphasis on aligning model responses with human preferences.
  • Improved Response Quality: The training process, likely involving Identity Preference Optimization (IPO), aims to enhance the overall quality and helpfulness of generated text.
  • Base Model Foundation: Builds upon the capabilities of the Qwen3-8B architecture, suggesting strong general language understanding and generation.

Training Details

The model underwent a single epoch of training with a learning rate of 5e-07, utilizing a distributed setup across 4 GPUs. Key metrics from the evaluation set include a rewards accuracy of 0.6940 and a chosen log-probability of -1.7890, indicating its performance in distinguishing preferred responses. The training involved a total batch size of 128, using the AdamW optimizer with a cosine learning rate scheduler.