Qiskit/mistral-small-3.2-24b-qiskit

Warm
Public
Vision
24B
FP8
32768
Oct 8, 2025
License: apache-2.0
Hugging Face
Overview

Overview

Mistral-Small-3.2-24B-Qiskit is a 24 billion parameter language model developed by Qiskit, specifically fine-tuned for generating and understanding Qiskit code. It is built upon the Mistral-Small-3.2-24B-Instruct-2506 model and leverages instruction tuning with Qiskit code and data licensed under permissive terms (Apache 2.0, MIT, etc.). The model is trained with the latest Qiskit version (2.1), ensuring compatibility with current APIs and syntax, and supports a long context window of up to 128K tokens.

Key Capabilities

  • Specialized Qiskit Code Generation: Significantly improved capabilities in writing high-quality, non-deprecated Qiskit code.
  • Long-Context Support: Handles up to 128K tokens, beneficial for complex quantum programming tasks.
  • Updated Training Data: Trained with Qiskit version 2.1, ensuring relevance and accuracy for modern quantum computing development.
  • Text and Vision Tasks: Inherits top-tier capabilities in both text and vision tasks from its base model.

Benchmarks

In comparative benchmarks, Mistral-Small-3.2-24B-Qiskit achieves a 32.45 on QiskitHumanEval-Hard and 47.02 on QiskitHumanEval, demonstrating strong performance in Qiskit-specific coding challenges. It also scores 97.50 on SciQ and 64.00 on MBPP.

Training Details

The model was trained using IBM's supercomputing cluster (Vela) with NVIDIA A100 GPUs. Training data includes publicly available code datasets and synthetic data generated at IBM Quantum, with code older than 2023 excluded. Datasets undergo exact and fuzzy deduplication, and Personally Identifiable Information (PII) is redacted.

Ethical Considerations and Limitations

While suited for various code-related tasks, the model has not undergone safety alignment and may produce problematic outputs. Users should exercise caution and not rely entirely on generated code for critical decisions. There is also an ongoing area of research regarding potential hallucination in smaller models due to memorization from training data.