millennium-qu/DirtyKing
millennium-qu/DirtyKing is a 4 billion parameter Qwen3-based instruction-tuned model, post-trained using Direct Preference Optimization (DPO) for specific Chinese conversational preferences. It specializes in generating "rude" or "dirty" language, maintaining strong language capabilities while enhancing human-like responses and expressive power in particular scenarios. This model is designed for applications requiring a distinct, unfiltered conversational style.
Loading preview...
Overview
DirtyKing is a 4 billion parameter model developed by millennium-qu, built upon the Qwen/Qwen3-4B-Instruct-2507 base model. It has undergone Direct Preference Optimization (DPO) post-training to align with a specific, unfiltered Chinese conversational style. The model is explicitly designed to generate "rude" or "dirty" language, while retaining the base model's strong language generation capabilities.
Key Capabilities
- Specialized Conversational Style: Optimized for generating responses with a "rude" and "unfiltered" tone in Chinese.
- Enhanced Human-like Interaction: Improves the naturalness and expressive power of replies in specific conversational contexts.
- DPO Alignment: Utilizes Direct Preference Optimization with datasets like Karsh-CAI/btfChinese-DPO-small and
dpo_mix_zhfor fine-tuning. - Efficient Training: Trained using LLaMA-Factory on 4x NVIDIA RTX 4090 GPUs with BF16 precision.
Good For
- Applications requiring a model that can generate intentionally "rude" or "blunt" Chinese dialogue.
- Simulating specific character personas in conversational AI where an unfiltered communication style is desired.
- Research into preference alignment for niche conversational styles.