MiniMax, the Shanghai-based AI lab, released its M3 model on Monday with API access live but no downloadable weights. The company says model files and a technical report will arrive within 10 days, targeting roughly June 11, on Hugging Face and GitHub.

The gap between API launch and weight release is deliberate. MiniMax captures inference revenue from the window while preserving the open-weight commitment that drives developer adoption. The strategy mirrors what Mistral did with Mixtral in late 2023: ship the API, build the audience, then release the weights before the developer community loses patience.

The benchmark numbers are notable. MiniMax reports 59.0% on SWE-Bench Pro and 74.2% on MCP Atlas, with 66.0% on Terminal-Bench 2.1. A company-run Hopper test showed M3 running continuously for 24 hours on FP8 matrix multiplication, making 147 benchmark submissions and 1,959 tool calls while raising GPU utilization from 7.6% to 71.3%. Those figures came from MiniMax’s own test harness, which the methodology notes also cite Claude Code scaffolding. Independent verification awaits the weight release.

The architecture behind M3’s long-context performance is MiniMax Sparse Attention, or MSA. Rather than computing full attention across every token in a 1M-token context, MSA pre-filters cached keys and values into blocks, then reads each selected block once. MiniMax claims this cuts per-token compute at the 1M-context length to one-twentieth of its previous-generation model. The claim is plausible given sparse attention research, but the technical report has not published yet.

M3 supports image and video inputs with text output, placing it directly against coding agents that require repository text, screenshots, diagrams, and long tool histories in a single session. The API page lists both Anthropic-compatible and OpenAI-compatible endpoints, which reduces switching friction for teams already running on either stack.

Pricing at standard rates is $0.60 per million input tokens and $2.40 per million output tokens up to 512,000 input tokens. The guaranteed 512K minimum matters because several long-context providers silently throttle at high load. Subscription plans start at $20 per month for roughly 1.7 billion M3 tokens. VentureBeat noted lower first-week promotional rates from MiniMax and platform partners.

MiniMax shares fell 16% in Hong Kong after the M3 launch, per MarketWatch, following the company’s disclosure of plans for a Shanghai STAR Market listing. The report cited a listing-guidance agreement with Citic Securities and a filing with the Shanghai bureau of the China Securities Regulatory Commission. The stock reaction suggests the market sees the STAR filing as a dilution signal, not a vote of confidence in M3.

The South China Morning Post noted that MiniMax did not disclose M3’s parameter count or training compute. That absence limits direct comparison with models whose full training cards are public. Nemotron 3 Ultra, Nvidia’s recent US open-weights release, published its parameter count and training methodology; M3 has not. When the weights drop, parameter count inference from the checkpoint will settle some of those comparisons.

MiniMax has already beaten Anthropic’s Claude Opus 4.7 on BrowseComp with an earlier M3 preview. If the weight release holds to the June 11 window and the performance figures survive independent evaluation, M3 will be the strongest open-weight model available for teams that need coding, vision, and long-context work in one checkpoint.

Teams evaluating open-weight models for coding agent pipelines should hold procurement decisions until June 11, when the M3 weights either arrive as promised or do not.

Implicator (implicator.ai), 2026-06-02.