W-61/qwen3-8b-base-slic-hf-ultrafeedback-4xh200-batch-128-20260422-131855

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 23, 2026Architecture:Transformer Cold

W-61/qwen3-8b-base-slic-hf-ultrafeedback-4xh200-batch-128-20260422-131855 is an 8 billion parameter language model, fine-tuned from W-61/qwen3-8b-base-sft-ultrachat-4xh200-batch-128 on the HuggingFaceH4/ultrafeedback_binarized dataset. This model is optimized for conversational response generation, demonstrating a rewards accuracy of 0.4945 on its evaluation set. It is intended for applications requiring refined dialogue capabilities and preference alignment.

Loading preview...

Model Overview

This model, W-61/qwen3-8b-base-slic-hf-ultrafeedback-4xh200-batch-128-20260422-131855, is an 8 billion parameter language model. It is a fine-tuned iteration of the W-61/qwen3-8b-base-sft-ultrachat-4xh200-batch-128 base model, specifically trained on the HuggingFaceH4/ultrafeedback_binarized dataset.

Key Characteristics

  • Fine-tuned for Preference Alignment: The model has undergone fine-tuning using the Ultrafeedback dataset, which is designed to align model responses with human preferences.
  • Evaluation Metrics: Achieved a rewards accuracy of 0.4945 on the evaluation set, with a rewards margin of -12.0516, indicating its performance in distinguishing preferred responses.
  • Training Configuration: Trained with a learning rate of 5e-07, a total batch size of 128, and utilizing a cosine learning rate scheduler with a warmup ratio of 0.1 over 1 epoch.

Intended Use Cases

This model is suitable for applications requiring improved conversational quality and adherence to user preferences, particularly in dialogue systems or interactive AI agents where response quality and alignment are critical. Its fine-tuning on a feedback dataset suggests an optimization for generating more desirable and contextually appropriate outputs.