MiniMaxAI/MiniMax-M3

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:427BQuant:FP8Ctx Length:195kPublished:Jun 2, 2026License:otherArchitecture:Transformer0.5K Warm

MiniMaxAI's MiniMax-M3 is a 427 billion parameter native multimodal model with a 200,000 token context length. It is designed for deep semantic fusion across text, image, and video, undergoing mixed-modality training from inception. The model utilizes MiniMax Sparse Attention (MSA) for efficient long-context processing, achieving significant speedups and reduced per-token compute. MiniMax-M3 excels in complex reasoning, agentic tasks, and long-horizon collaboration, particularly in coding and general "cowork" capabilities.

Loading preview...

MiniMax-M3: A Native Multimodal Model

MiniMax-M3, developed by MiniMaxAI, is a 427 billion parameter native multimodal model featuring a 200,000 token context window. It is distinguished by its mixed-modality training from the outset, enabling deep semantic integration across text, image, and video data.

Key Capabilities & Innovations

  • Native Multimodality: Achieves deeper semantic fusion by training on mixed modalities (text, image, video) from the initial stages.
  • Context Scaling with MiniMax Sparse Attention (MSA): Introduces a high-performance sparse attention operator to enhance long context efficiency. MSA delivers 9x prefill and 15x decode speedups at 1M context compared to its predecessor (M2), reducing per-token compute by 20x. More details are available in the technical report.
  • Coding & Cowork Capability: Demonstrates strong performance in long-horizon agentic benchmarks, making it proficient in coding and collaborative tasks.

Recommended Use Cases

MiniMax-M3 supports two primary reasoning modes:

  • Thinking Mode: Ideal for complex reasoning, agentic tasks, and long-horizon collaboration.
  • Non-thinking Mode: Suitable for latency-sensitive applications such as chat and code completion.

Recommended inference parameters for optimal performance include temperature=1.0, top_p=0.95, and top_k=40.