TeichAI/Devstral-Small-2505-Deepseek-V3.2-Speciale-Distill

TEXT GENERATIONConcurrency Cost:2Model Size:24BQuant:FP8Ctx Length:32kPublished:Feb 4, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

TeichAI/Devstral-Small-2505-Deepseek-V3.2-Speciale-Distill is a 24 billion parameter Mistral-based language model developed by TeichAI, fine-tuned from unsloth/devstral-small-2505. This model was trained with Unsloth and Huggingface's TRL library, emphasizing faster training. With a 32768 token context length, it is optimized for applications requiring efficient processing of longer sequences. Its fine-tuned nature suggests specialized performance in areas related to its base model's strengths, enhanced by the training methodology.

Loading preview...

Model Overview

TeichAI/Devstral-Small-2505-Deepseek-V3.2-Speciale-Distill is a 24 billion parameter language model developed by TeichAI. It is fine-tuned from the unsloth/devstral-small-2505 base model, which is part of the Mistral family architecture. This model leverages the Unsloth library and Huggingface's TRL library for its training process, notably achieving a 2x faster training speed.

Key Characteristics

  • Architecture: Based on the Mistral model family.
  • Parameter Count: 24 billion parameters.
  • Training Efficiency: Utilizes Unsloth for significantly faster training (2x speedup).
  • Context Length: Supports a context window of 32768 tokens.
  • License: Distributed under the Apache-2.0 license.

Good For

  • Applications requiring a powerful Mistral-based model with a substantial parameter count.
  • Use cases benefiting from efficient training methodologies, potentially leading to more specialized or frequently updated models.
  • Tasks that demand a large context window for processing extensive inputs or generating detailed outputs.