pltops/qwen2_7B-dis-wspo-full_E1

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:May 2, 2026Architecture:Transformer Cold

The pltops/qwen2_7B-dis-wspo-full_E1 model is a 7.6 billion parameter language model, fine-tuned from wh-zhu/qwen2_1.5B-ultrachatfeedback-dpo. It is optimized using the ultrafeedback dataset, suggesting a focus on improving response quality and alignment. This model is suitable for applications requiring refined conversational abilities and adherence to user preferences.

Loading preview...

Model Overview

The pltops/qwen2_7B-dis-wspo-full_E1 is a 7.6 billion parameter language model, representing a fine-tuned iteration of the wh-zhu/qwen2_1.5B-ultrachatfeedback-dpo base model. This model has undergone further training on the ultrafeedback dataset, indicating an emphasis on enhancing its ability to generate high-quality, aligned, and helpful responses based on human feedback.

Key Characteristics

  • Base Model: Fine-tuned from wh-zhu/qwen2_1.5B-ultrachatfeedback-dpo.
  • Parameter Count: 7.6 billion parameters, offering a balance between performance and computational efficiency.
  • Training Data: Utilizes the ultrafeedback dataset, suggesting a focus on improving conversational quality and alignment through preference learning.
  • Context Length: Supports a context length of 32768 tokens, enabling processing of longer inputs and generating more coherent, extended outputs.

Training Details

The model was trained with a learning rate of 1e-06, a total batch size of 32, and for 1 epoch. The optimizer used was Adam with specific beta and epsilon values, and a cosine learning rate scheduler with a warmup ratio of 0.1. This configuration aims to achieve stable and effective fine-tuning.

Potential Use Cases

This model is likely well-suited for applications requiring:

  • Improved Conversational AI: Generating more natural and contextually relevant dialogue.
  • Content Generation: Creating high-quality text that aligns with user preferences.
  • Instruction Following: Better adherence to complex instructions due to feedback-driven training.

Further details on specific intended uses, limitations, and comprehensive evaluation data are pending.