Model Overview
Nabbers1999/MS-24B-Bathory-GRPO is a 24 billion parameter Mistral-based model that has undergone extensive fine-tuning and GRPO (Generative Reinforcement Learning from Prose Optimization) training. The primary goal of this model is to refine its writing capabilities, improve instruction following, and enhance its performance in long-context chat and roleplay scenarios. The model's configuration now includes a baked-in chat template, addressing issues with chat completion mode in Llama.cpp for modern Mistral models.
Key Capabilities
- Refined Writing Style: GRPO training, utilizing a custom BERT model (Nabbers1999/ModernBERT-StyleClassifier), helps reduce generic AI writing 'slop' and encourages a more nuanced prose style.
- Instruction Following: Multiple rounds of fine-tuning have improved the model's ability to follow complex instructions, including understanding SillyTavern's character card prompt format.
- Long-Context Roleplay & Chat: Optimized for maintaining continuity in multi-round chat and roleplay, making it suitable for interactive storytelling.
- Prose Length Control: Encourages a prose limit of 768 tokens per response, promoting concise and focused outputs.
Good For
- Creative Writing: Generating prose with a refined style and specific narrative instructions.
- Roleplay & Interactive Fiction: Excelling in maintaining character consistency and narrative flow over extended conversations.
- SillyTavern Users: Improved understanding and utilization of character card prompts.