sajjadiba/Qwen3-0.6B-Chat-SFT-ultrachat3k-DPO-argilla6k
sajjadiba/Qwen3-0.6B-Chat-SFT-ultrachat3k-DPO-argilla6k is a 0.8 billion parameter chat model fine-tuned from the Qwen/Qwen3-0.6B architecture. This model has undergone supervised fine-tuning (SFT) on 3000 samples from ultrachat_200k and further optimized with Direct Preference Optimization (DPO) using 6000 samples from argilla/distilabel-intel-orca-dpo-pairs. It is designed for conversational AI tasks, offering a compact solution for chat applications with a 32768 token context length.
Loading preview...
Model Overview
This model, sajjadiba/Qwen3-0.6B-Chat-SFT-ultrachat3k-DPO-argilla6k, is a compact 0.8 billion parameter chat-optimized language model built upon the Qwen/Qwen3-0.6B base architecture. It has been specifically fine-tuned for conversational applications through a two-stage process.
Training Methodology
- Supervised Fine-Tuning (SFT): The model initially underwent SFT using 3000 samples sourced from the
ultrachat_200kdataset, enhancing its ability to follow instructions and generate coherent responses. - Direct Preference Optimization (DPO): Further refinement was achieved through DPO, utilizing 6000 samples from the
argilla/distilabel-intel-orca-dpo-pairsdataset. This step aims to align the model's outputs with human preferences, improving response quality and helpfulness.
Key Characteristics
- Chat-Optimized: Designed for interactive conversational use cases.
- Compact Size: At 0.8 billion parameters, it offers a balance between performance and computational efficiency.
- Extended Context: Supports a context length of 32768 tokens, allowing for longer conversations.
Important Considerations
WARNING: This model is a research artifact and lacks safety guardrails. It may produce harmful, dangerous, or inappropriate content. It is not suitable for public deployment without implementing robust safety measures such as content filtering and supervised moderation.