The "Three Conditions" No Open Model Had Combined Before
MiniMax released MiniMax M3 on May 31, 2026, framing it as the first model to crack what they call the three conditions that only closed-source frontier models had previously achieved simultaneously:
- Frontier-level coding and agentic capability — 59% on SWE-Bench Pro
- Ultra-long context — 1 million tokens natively
- Native multimodality — images, video input, and desktop computer operation out of the box
MiniMax Sparse Attention (MSA): The Architecture Behind the Speed
The technical core of M3 is MiniMax Sparse Attention (MSA), a new attention architecture that bypasses the memory and compute bottleneck traditional transformers hit at long contexts.
- At 1M context: cost per token is just 1/10th of the previous generation
- 9x acceleration in the prefill stage
- 15x acceleration in the decoding stage
- 4x higher compute speed vs. comparable open-source solutions
This matters enormously for agents. Long-running agent tasks — 24-hour sessions, thousands of tool calls — constantly accumulate context. M3 can hold an entire codebase, thousands of log entries, or an hour-long video in memory without degrading.
MiniMax demonstrated M3 autonomously reproducing experiments from an ICLR top paper over 12 hours, running continuously for 24 hours without reference code and making ~2,000 tool calls, and improving FP8 matrix multiplication hardware utilization on a Hopper GPU from 7.6% to 71.3%.
Benchmark Comparison
| Benchmark | MiniMax M3 | GPT-5.5 | Claude Opus 4.7 |
|---|---|---|---|
| SWE-Bench Pro | 59.0% | 58.6% | ~53% |
| MCP Atlas | 74.2% | — | — |
| Terminal-Bench 2.1 | 66.0% | 83.4% | 69.7% |
| Video-MME | 84.6% | — | — |
M3 leads GPT-5.5 on SWE-Bench Pro — making it the highest-scoring open-weight model on that benchmark. However, it trails GPT-5.5 on Terminal-Bench 2.1, so the performance advantage is selective rather than across the board.
Model weights are not yet public, but M3 is live via API, OpenRouter, MiniMax Code (code.minimax.io), and the Hermes agent. An introductory 50% discount brings pricing to $0.30/M input tokens and $1.20/M output tokens — roughly 17x cheaper than Claude Opus on a per-token basis at current rates.
The Chinese Open-Source AI Wave
M3 doesn't arrive in isolation. Within days of each other in late May/early June 2026, three Chinese AI labs dropped frontier open-weight models: MiniMax M3, Zhipu's GLM-5.1 (SWE-Bench Pro leader among open models), and Moonshot's Kimi K2.6 (86.3% on BrowseComp agent benchmark). This coordinated open-source strategy from Chinese labs is fundamentally disrupting the assumption that frontier AI requires a closed-source, expensive commercial model.
- MiniMax M3: first open-weight model to combine frontier coding, 1M context, and native multimodality
- MSA architecture delivers 15x faster decoding and 10x lower cost at 1M token context
- 59% SWE-Bench Pro — narrows the gap between open and closed frontier models
- Live today via API/OpenRouter; Hugging Face weights and full tech report releasing soon
- Part of a broader wave of high-capability open-weight models from Chinese AI labs
— MiniMax M3 Official Launch Blog
— MiniMax M3 Model Page and Docs
— MiniMax API Platform — Developer Access
— MiniMax on Hugging Face — Weights Releasing Soon