yunjae-won/mpq3_qwen4bi_sft_dpo_beta1e-1_step512

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 6, 2026Architecture:Transformer Warm

The yunjae-won/mpq3_qwen4bi_sft_dpo_beta1e-1_step512 is a 4 billion parameter language model developed by yunjae-won. This model is a fine-tuned version, likely optimized for specific instruction-following or dialogue tasks through Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). With a context length of 32768 tokens, it is suitable for applications requiring processing of moderately long inputs and generating coherent, contextually relevant responses.

Loading preview...

Model Overview

The yunjae-won/mpq3_qwen4bi_sft_dpo_beta1e-1_step512 is a 4 billion parameter language model developed by yunjae-won. This model has undergone a specific fine-tuning process, incorporating both Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) with a beta value of 1e-1, trained over 512 steps. While specific details on its training data and exact optimizations are not provided in the model card, the methodology suggests an emphasis on aligning the model's outputs with human preferences and instructions.

Key Capabilities

  • Instruction Following: The SFT and DPO training indicate a focus on improving the model's ability to understand and execute instructions effectively.
  • Preference Alignment: DPO training aims to enhance the model's responses to be more helpful, harmless, and aligned with desired conversational styles.
  • Extended Context: With a context length of 32768 tokens, the model can process and generate responses based on substantial amounts of input text, making it suitable for tasks requiring broader contextual understanding.

Good for

  • Dialogue Systems: Its fine-tuning for instruction following and preference alignment makes it potentially well-suited for chatbots and conversational AI.
  • Content Generation: The ability to handle long contexts could be beneficial for generating longer-form content that requires maintaining coherence over extended passages.
  • Research and Experimentation: Developers interested in exploring the effects of SFT and DPO on 4B parameter models with extended context will find this model a valuable base.