Overview
Overview
puwaer/Doujinshi-14b-chat is a 14 billion parameter large language model built upon the Qwen/Qwen3-14B architecture. It has undergone extensive continuous pre-training, DPO (Direct Preference Optimization), and SFT (Supervised Fine-Tuning) specifically for R18 content. The model was trained using a substantial 4 billion token dataset, meticulously scraped from Japanese R18 platforms like dmm.co.jp and dlsite.com.
Key Capabilities
- Specialized R18 Content Generation: Designed from the ground up to understand and generate content related to R18 themes.
- Chat-Optimized Interaction: This specific variant is fine-tuned for conversational use, making it adept at natural dialogue and smooth user interactions.
- Japanese Language Focus: The training data sources indicate a strong focus on Japanese R18 content and language nuances.
Model Variants
This model is part of a family of specialized Doujinshi-14b models, each with a distinct focus:
- Doujinshi-14b-chat: Optimized for conversational interaction and free-talk.
- Doujinshi-14b-instruct: Geared towards information provision, question answering, and instruction-following.
- Doujinshi-14b-roleplay: Specialized for character role-playing, maintaining first-person perspectives and character-specific speech patterns for immersive experiences.
Training Data
The model's training leveraged several proprietary datasets, including:
puwaer/dlsite-jp-v1,v2,v3puwaer/dmm-fanza-jp-v1,v2,v3puwaer/Doujinshi-sft-dataset-v1puwaer/Doujinshi-dpo-dataset-v1
License
The model is released under the Apache 2.0 License.