MiniMax M3 beats Opus 4.7 on BrowseComp as open-weights

MiniMax released M3 on May 29, posting benchmark results that put the open-weights model above Claude Opus 4.7 on at least one public frontier task, with weights shipping to the public in approximately 10 days.

The headline number: BrowseComp 83.5, compared to Opus 4.7’s 79.3. BrowseComp tests agentic web research under real browser conditions, which makes it a more operationally meaningful signal than most static reasoning benchmarks. An open-weights model clearing that bar matters for any team that is currently paying Anthropic API rates to run similar web-agent workflows.

The coding story is more nuanced. MiniMax’s own model page at minimax.io shows M3 scoring 59.0% on SWE-Bench Pro, surpassing GPT-5.5 and Gemini 3.1 Pro on that benchmark while approaching but not exceeding Opus 4.7. The company’s framing positions M3 as the first open-weights model to combine frontier-tier coding, long context, and native multimodality simultaneously. Each of those capabilities has shipped from other labs in isolation; the combination is the claim MiniMax is making.

The context architecture is the cost story. MiniMax built a new attention mechanism called MiniMax Sparse Attention (MSA), which scales context to 1 million tokens while keeping per-token compute at 1/20 the level of the prior MiniMax generation at that length. A guaranteed minimum of 512K tokens is supported through MSA. For teams running document-grounded agents or long-conversation pipelines, that compute ratio shifts the self-hosting economics substantially. This publication covered the technical sparse-attention preview on May 30; the numbers now confirmed in the full release align with what was teased there.

The multimodal integration is native, not bolted on. According to the MiniMax announcement on X, the model was trained with multimodal data from pretraining on 100T-plus tokens rather than added via a post-trained adapter. That means image and video input are grounded in the same weight space as text, and the company says M3 can operate a desktop computer via multimodal grounding. Whether desktop control holds up in adversarial or production conditions is not addressed in the release announcement.

Other agentic benchmark scores from the official announcement thread: 66.0% on Terminal Bench 2.1, 34.8% on SWE-fficiency, 28.8% on KernelBench Hard, and 74.2% on MCP Atlas.

The model is available now through MiniMax’s API at platform.minimax.io and through MiniMax Code at code.minimax.io. Pricing: 50% off standard usage for the first seven days, capped at context windows at or below 512K tokens. Weights and a technical report will publish to Hugging Face and GitHub in approximately 10 days from the May 29 announcement.

That 10-day gap is the practical constraint. Until weights drop, M3 is API-only, which means teams that want to self-host, fine-tune, or run behind a firewall cannot act yet. The benchmark results are also MiniMax-reported, with no independent reproduction published at announcement time.

For builders deciding where to route agentic workloads this quarter: if the BrowseComp number holds under independent eval, M3 becomes the default open-weights candidate for web-research agents at long context, displacing any current instinct to default to Llama or Qwen for that use case. The SWE-Bench Pro position just below Opus 4.7 puts M3 in the tier where it should be tested against frontier closed models before a hosting decision, not assumed inferior.

Source: MiniMax official announcement thread posted May 29, 2026, on X (@MiniMax_AI), mirrored at threadreaderapp.com, with model details at minimax.io/models/text/m3.

MiniMax M3 beats Opus 4.7 on BrowseComp as open-weights

The morning brief for people inside the AI industry.

More in Models

Nvidia's N1X laptop chip targets Intel's last stronghold

Prism ML ships 4B-parameter diffusion that runs on an iPhone

Anthropic releases Opus 4.8 with effort controls and cheaper fast mode