zzoceanpie/Qwen3-1.7B-Yukari-DPO
The zzoceanpie/Qwen3-1.7B-Yukari-DPO is a 2 billion parameter language model based on the Qwen3-1.7B architecture, fine-tuned for role-playing the character Yukari Yakumo from Touhou Project. It utilizes Direct Preference Optimization (DPO) to reduce offensive outputs, verbosity, and formulaic generation. This model is specifically designed for character-driven conversational AI, allowing emotional nuance control via input tags.
Loading preview...
Overview
This model, Qwen3-1.7B-Yukari-DPO, is a 2 billion parameter language model derived from the Qwen/Qwen3-1.7B base. It is specifically fine-tuned for role-playing the character Yukari Yakumo from the Touhou Project, building upon the Qwen3-1.7B-Yukari-SFT-v2 version.
Key Differentiators
- Character Role-play: Optimized for emulating Yukari Yakumo, a character from Touhou Project, making it a fan-made derivative work.
- DPO Alignment: Utilizes Direct Preference Optimization (DPO) to align the model's outputs with desired preferences, specifically to:
- Eliminate offensive content.
- Reduce overly verbose responses.
- Minimize formulaic or generic generation patterns.
- Emotional Control: Supports input tags based on an 8-dimensional Plutchik emotion vector (joy, anger, sadness, fear, disgust, surprise, trust, anticipation), allowing users to control the emotional tone of Yukari's responses.
- Training Method: Fine-tuned using QLoRA 4-bit NF4 with DPO (sigmoid loss) over 3 epochs, achieving a DPO accuracy of 87.5%.
Usage
The model expects a specific input format: [emotion tags]\nUser input. Emotion tags are structured as [<|emotion_level_emotion_name|>...], where emotion_level can range from "none" to "extremely strong" for each of the eight emotions. Both transformers library and llama.cpp (GGUF format) are supported for inference.
Licensing
- Model weights are under Apache 2.0 License.
- The character "Yukari Yakumo" is copyrighted by Team Shanghai Alice / ZUN, and this model adheres to the Touhou Project's fan-made derivative work guidelines.