allura-org/MS3.2-24b-Angel

TEXT GENERATIONConcurrency Cost:2Model Size:24BQuant:FP8Ctx Length:32kPublished:Jul 8, 2025Architecture:Transformer0.0K Cold

allura-org/MS3.2-24b-Angel is a 24 billion parameter model, fine-tuned from Mistral Small 3.2, designed for advanced roleplaying, storywriting, and nuanced general instruction following. It features a 32768 token context length and excels in generating strong prose and character portrayals, rivalling larger models in its class. The model underwent a multi-stage training process including SFT and KTO, with its vision adapter initially removed for optimization and later re-added to support multimodality.

Loading preview...

Model Overview

MS3.2-24b-Angel is a 24 billion parameter model developed by allura-org, fine-tuned from Mistral Small 3.2. It is specifically optimized for roleplaying, storywriting, and diverse general instruction-following use cases. The model demonstrates strong prose generation and character portrayal capabilities, with internal testing suggesting performance comparable to some 72B models.

Key Capabilities & Features

  • Enhanced Roleplaying: Designed to excel in character portrayal and narrative consistency.
  • Advanced Storywriting: Generates high-quality, coherent prose for creative writing tasks.
  • General Instruction Following: Capable of handling a variety of instructional prompts with nuanced responses.
  • Multimodal Support: The model's vision tower was manually re-added after training to support multimodality.

Training Process Highlights

The model's development involved a multi-stage process:

  1. Vision Adapter Removal: The base Mistral Small 3.2 model had its vision adapter initially removed for training optimization.
  2. Supervised Fine-Tuning (SFT): Trained on general instruct, storytelling, and roleplaying data using Axolotl.
  3. KTO Process: Further refined with a KTO (Kahneman-Tversky Optimization) process using Unsloth, focusing on storywriting, anti-slop data, general instruction following, and human preference.
  4. Vision Tower Re-addition: The vision tower was manually integrated back into the weights to restore multimodal capabilities.

Recommended Usage

Users are advised to use Mistral v7 Tekken for inference and, if supported by their framework (e.g., vLLM with --tokenizer-mode mistral), the official Mistral tokenization code. Recommended sampling parameters include a temperature of 1.2, min_p of 0.1, and a repetition penalty of 1.05.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p