ArliAI/QwQ-32B-ArliAI-RpR-v4

Warm
Public
32B
FP8
32768
License: apache-2.0
Hugging Face
Overview

ArliAI/QwQ-32B-ArliAI-RpR-v4 Overview

QwQ-32B-ArliAI-RpR-v4 is a 32-billion parameter model from ArliAI's RpR (RolePlay with Reasoning) series, fine-tuned on the QwQ-32B base model. It leverages an advanced dataset curation methodology, originally developed for the RPMax series, to enhance creative writing and roleplay performance. A key innovation is the creation of a reasoning-focused RP dataset, processed from the RPMax dataset using the QwQ Instruct model itself, to enable coherent and interesting outputs in long, multi-turn roleplay chats while maintaining reasoning abilities.

Key Capabilities & Features

  • Optimized for Creative Writing & Roleplay: Designed to produce highly creative and varied outputs, minimizing cross-context repetition and generic tropes.
  • Enhanced Reasoning: Incorporates a unique training method that allows the model to perform reasoning without seeing reasoning blocks in its context during inference, leading to more consistent and logical responses in complex scenarios.
  • Reduced Repetition & Impersonation: Utilizes advanced filtering during training to mitigate common LLM issues like repetitive phrases and speaking for the user.
  • Extended Context Awareness: Trained with a sequence length of 16K, supporting a native context length of 32K tokens, which aids in memory and awareness over longer conversations.
  • Unique Training Methodology: Employs a single-epoch training approach with a higher learning rate and low gradient accumulation to prevent overfitting and encourage diverse response generation.

Good For

  • Long Multi-Turn Roleplay: Excels in maintaining coherence and creativity across extended interactive roleplay sessions.
  • Creative Writing Applications: Ideal for generating varied and imaginative text, stories, and character interactions.
  • Applications Requiring Reasoning in Conversational Contexts: Suitable for scenarios where logical progression and consistent character behavior are crucial over many turns.

Usage Notes

  • Sampler Settings: Recommended to use simple sampler settings (e.g., Temperature: 1.0, MinP: 0.02, TopK: 40) and allow for high response tokens (2048+) for optimal performance.
  • Reasoning Block Configuration: Requires specific prefix/suffix settings (e.g., <think> and </think>) for reasoning models in interfaces like SillyTavern to function correctly.