Overview
Model Overview
MS3.2-24b-Angel is a 24 billion parameter model developed by allura-org, fine-tuned from Mistral Small 3.2. It is specifically optimized for roleplaying, storywriting, and diverse general instruction-following use cases. The model demonstrates strong prose generation and character portrayal capabilities, with internal testing suggesting performance comparable to some 72B models.
Key Capabilities & Features
- Enhanced Roleplaying: Designed to excel in character portrayal and narrative consistency.
- Advanced Storywriting: Generates high-quality, coherent prose for creative writing tasks.
- General Instruction Following: Capable of handling a variety of instructional prompts with nuanced responses.
- Multimodal Support: The model's vision tower was manually re-added after training to support multimodality.
Training Process Highlights
The model's development involved a multi-stage process:
- Vision Adapter Removal: The base Mistral Small 3.2 model had its vision adapter initially removed for training optimization.
- Supervised Fine-Tuning (SFT): Trained on general instruct, storytelling, and roleplaying data using Axolotl.
- KTO Process: Further refined with a KTO (Kahneman-Tversky Optimization) process using Unsloth, focusing on storywriting, anti-slop data, general instruction following, and human preference.
- Vision Tower Re-addition: The vision tower was manually integrated back into the weights to restore multimodal capabilities.
Recommended Usage
Users are advised to use Mistral v7 Tekken for inference and, if supported by their framework (e.g., vLLM with --tokenizer-mode mistral), the official Mistral tokenization code. Recommended sampling parameters include a temperature of 1.2, min_p of 0.1, and a repetition penalty of 1.05.