Overview
ArliAI/QwQ-32B-ArliAI-RpR-v3: Roleplay with Reasoning
QwQ-32B-ArliAI-RpR-v3 is the latest 32-billion parameter model from ArliAI, building upon the successful RPMax series' dataset curation and training methods. This version, based on the QwQ-32B model, introduces significant improvements for roleplay and creative writing, particularly in maintaining reasoning abilities across long, multi-turn conversations.
Key Differentiators & Improvements (v3):
- Enhanced Creativity & Out-of-the-Box Thinking: Designed for extreme creativity, moving away from previous base model limitations.
- Refined Reasoning: The RpR dataset generation was re-run to ensure thinking tokens consistently match model responses, addressing prior "dissociated thoughts."
- Eliminated Refusals & Nonsense Words: Dataset generation now uses QwQ-abliterated to prevent refusals, and misplaced censoring attempts in open datasets have been fixed.
- Optimized Training: Utilizes the Rex scheduler for improved learning nuances by maintaining a higher learning rate for longer.
- Unique RP Dataset: Processes the RPMax dataset into a reasoning dataset using the base QwQ Instruct model to create reasoning processes for each turn, ensuring coherent multi-turn RP.
- Context-Aware Training: Trained to never see reasoning blocks in its context during training, mirroring inference usage for consistent performance.
Specs & Training:
- Base Model: QwQ-32B
- Parameters: 32B
- Max Context Length: 128K (Realistically 32K)
- Fine-tuning Method: RS-QLORA+ (Rank-Stabilized LoRA + LoRA Plus 8x)
- Training Philosophy: Employs a single-epoch, high learning rate approach to maximize learning from individual examples and prevent overfitting to specific tropes, fostering higher creativity and reduced cross-context repetition.
When to Use This Model:
- Long-form Roleplay: Excels in multi-turn, complex narrative interactions where consistent reasoning is crucial.
- Creative Writing: Ideal for generating highly creative and varied outputs without falling into repetitive patterns.
- Applications Requiring Coherent Reasoning: Suitable for scenarios where the model needs to maintain logical thought processes throughout extended dialogues.