MiniMaxAI/MiniMax-M3
MiniMaxAI's MiniMax-M3 is a 427 billion parameter native multimodal model with a 200,000 token context length. It is designed for deep semantic fusion across text, image, and video, undergoing mixed-modality training from inception. The model utilizes MiniMax Sparse Attention (MSA) for efficient long-context processing, achieving significant speedups and reduced per-token compute. MiniMax-M3 excels in complex reasoning, agentic tasks, and long-horizon collaboration, particularly in coding and general "cowork" capabilities.
Loading preview...
MiniMax-M3: A Native Multimodal Model
MiniMax-M3, developed by MiniMaxAI, is a 427 billion parameter native multimodal model featuring a 200,000 token context window. It is distinguished by its mixed-modality training from the outset, enabling deep semantic integration across text, image, and video data.
Key Capabilities & Innovations
- Native Multimodality: Achieves deeper semantic fusion by training on mixed modalities (text, image, video) from the initial stages.
- Context Scaling with MiniMax Sparse Attention (MSA): Introduces a high-performance sparse attention operator to enhance long context efficiency. MSA delivers 9x prefill and 15x decode speedups at 1M context compared to its predecessor (M2), reducing per-token compute by 20x. More details are available in the technical report.
- Coding & Cowork Capability: Demonstrates strong performance in long-horizon agentic benchmarks, making it proficient in coding and collaborative tasks.
Recommended Use Cases
MiniMax-M3 supports two primary reasoning modes:
- Thinking Mode: Ideal for complex reasoning, agentic tasks, and long-horizon collaboration.
- Non-thinking Mode: Suitable for latency-sensitive applications such as chat and code completion.
Recommended inference parameters for optimal performance include temperature=1.0, top_p=0.95, and top_k=40.