TL;DR — MiniMax M3 is the first open-weight model to combine frontier coding (59% on SWE-Bench Pro, ahead of GPT-5.5), a 1M-token context window, and native multimodal support. Its custom MSA architecture makes long-context inference up to 15x faster and 10x cheaper. Weights are coming to Hugging Face soon; the API is live today.

The "Three Conditions" No Open Model Had Combined Before

MiniMax released MiniMax M3 on May 31, 2026, framing it as the first model to crack what they call the three conditions that only closed-source frontier models had previously achieved simultaneously:

  1. Frontier-level coding and agentic capability — 59% on SWE-Bench Pro
  2. Ultra-long context — 1 million tokens natively
  3. Native multimodality — images, video input, and desktop computer operation out of the box
59.0% SWE-Bench Pro Score
1M Token Context Window
15× Faster Decoding vs. Alternatives

MiniMax Sparse Attention (MSA): The Architecture Behind the Speed

The technical core of M3 is MiniMax Sparse Attention (MSA), a new attention architecture that bypasses the memory and compute bottleneck traditional transformers hit at long contexts.

  • At 1M context: cost per token is just 1/10th of the previous generation
  • 9x acceleration in the prefill stage
  • 15x acceleration in the decoding stage
  • 4x higher compute speed vs. comparable open-source solutions

This matters enormously for agents. Long-running agent tasks — 24-hour sessions, thousands of tool calls — constantly accumulate context. M3 can hold an entire codebase, thousands of log entries, or an hour-long video in memory without degrading.

💡
What does MiniMax say it actually did with M3?
MiniMax demonstrated M3 autonomously reproducing experiments from an ICLR top paper over 12 hours, running continuously for 24 hours without reference code and making ~2,000 tool calls, and improving FP8 matrix multiplication hardware utilization on a Hopper GPU from 7.6% to 71.3%.

Benchmark Comparison

Benchmark MiniMax M3 GPT-5.5 Claude Opus 4.7
SWE-Bench Pro 59.0% 58.6% ~53%
MCP Atlas 74.2%
Terminal-Bench 2.1 66.0% 83.4% 69.7%
Video-MME 84.6%

M3 leads GPT-5.5 on SWE-Bench Pro — making it the highest-scoring open-weight model on that benchmark. However, it trails GPT-5.5 on Terminal-Bench 2.1, so the performance advantage is selective rather than across the board.

ℹ️
Pricing and How to Access Today
Model weights are not yet public, but M3 is live via API, OpenRouter, MiniMax Code (code.minimax.io), and the Hermes agent. An introductory 50% discount brings pricing to $0.30/M input tokens and $1.20/M output tokens — roughly 17x cheaper than Claude Opus on a per-token basis at current rates.

The Chinese Open-Source AI Wave

M3 doesn't arrive in isolation. Within days of each other in late May/early June 2026, three Chinese AI labs dropped frontier open-weight models: MiniMax M3, Zhipu's GLM-5.1 (SWE-Bench Pro leader among open models), and Moonshot's Kimi K2.6 (86.3% on BrowseComp agent benchmark). This coordinated open-source strategy from Chinese labs is fundamentally disrupting the assumption that frontier AI requires a closed-source, expensive commercial model.

Key Takeaways
  • MiniMax M3: first open-weight model to combine frontier coding, 1M context, and native multimodality
  • MSA architecture delivers 15x faster decoding and 10x lower cost at 1M token context
  • 59% SWE-Bench Pro — narrows the gap between open and closed frontier models
  • Live today via API/OpenRouter; Hugging Face weights and full tech report releasing soon
  • Part of a broader wave of high-capability open-weight models from Chinese AI labs
🔗
Resources · Official Sources · Getting Started
MiniMax M3 Official Launch Blog
MiniMax M3 Model Page and Docs
MiniMax API Platform — Developer Access
MiniMax on Hugging Face — Weights Releasing Soon