Model Overview
casperhansen/Mistral-Small-3.1-24B-Base-2503-Text-Only is a 24 billion parameter base model derived from mistralai/Mistral-Small-3.1-24B-Base-2503. This version is specifically designed for text-only applications, achieved by removing the original model's vision encoder and adapting its architecture. It also serves as the base for mistralai/Devstral-Small-2505.
Key Capabilities
- Text-Only Processing: Optimized exclusively for natural language understanding and generation tasks, without multimodal capabilities.
- Extended Context Length: Supports a substantial 128k token context window, enabling processing of longer texts and complex queries.
- Strong General Reasoning: Achieves a 0-shot MMLU score of 77.25%, demonstrating robust performance across a wide range of academic and common-sense reasoning benchmarks. This score is very close to the 77.34% of its original multimodal variant, indicating minimal performance degradation from the text-only conversion.
Use Cases
This model is particularly well-suited for applications where high-performance text processing is required without the overhead of multimodal capabilities. Its large context window makes it ideal for:
- Long-form content generation and summarization
- Complex question answering and information extraction
- Code generation and analysis (as a base for models like Devstral-Small-2505)
- General-purpose conversational AI and chatbots