AI’s loudest fight this week was not about benchmarks. It was about borders. While heads of state and lab chiefs huddled at the G7 over who governs frontier models, China’s labs shipped capability and raised capital fast enough to make the coalition talk feel overdue.
The Governance Reckoning: Who Controls the Frontier
Frontier AI politics moved from essays to closed-door summits, and the people who build the models are now the ones drawing the maps.
- AI chiefs pitch a US-led frontier coalition at the G7. Dario Amodei and Demis Hassabis used a closed-door G7 lunch to float allied access to frontier models and a chip-trade bloc that cuts out China. Canada’s prime minister said yes.
- When the US government export-controlled Anthropic’s own models. The Trump administration’s surprise clampdown on Fable and Mythos exposed a governance vacuum no executive order has filled. Every frontier lab should read it as a warning.
- OpenAI tests models against real user conversations before release. A new pre-release method replays anonymized production chats through candidate models to catch the safety failures that scripted benchmarks miss, and it extends to tool-using agents.
Built for Agents, Not Humans
The software stack is being torn up and rebuilt around code that writes, ships, and pays for itself.
- Cursor launches Origin, a Git forge built for parallel AI agents. Origin treats version control as infrastructure for autonomous software factories rather than a collaboration tool for humans, a direct shot at GitHub’s human-scale model.
- Android 17 turns every app into an agent-callable tool. Google shipped AppFunctions and an on-device MCP layer that lets agents discover and run app capabilities locally, turning the OS itself into an agent platform.
- Codex gains live browser control via Chrome DevTools Protocol. OpenAI wired its coding agent straight into the browser, so Codex can read network traffic, profile JavaScript, and rewrite a live page’s DOM without external connectors.
- Stop paying twice: the gateway buffer fix for agent crashes. When an agent dies mid-stream, every major provider except OpenAI makes you re-prompt and re-pay for every token. A durable gateway buffer closes the leak.
The Model Race Tilts East
China’s labs set the week’s pace on both capability and capital, and they did it mostly in the open.
- Z.ai ships GLM-5.2 with a 1M-token context for full codebases. The Hangzhou lab’s third GLM-5 release targets agentic software engineering with a five-fold context jump and dual reasoning modes, with MIT-licensed weights promised next week.
- DeepSeek raises $7.4B to become China’s most valuable AI startup. A $50 billion valuation, with $3 billion of it from founder Liang Wenfeng’s own pocket, signals DeepSeek is no longer just a research story.
- Weibo’s 3B model matches flagships on LeetCode, falls short on science. VibeThinker-3B beats far larger models on first-attempt LeetCode and AIME, then collapses on graduate science, reopening the fight over what benchmarks actually measure.
- Qwen-RobotWorld makes plain language the control layer for robots. The Qwen team’s new video world model uses natural language as a single action interface across robotic manipulation, driving, and indoor navigation.
Silicon, Voice, and the Next Interface
The hardware and interface layer is sprinting to keep up with the models running on top of it.
- NVIDIA claims an MLPerf Training 6.0 sweep on Blackwell. NVIDIA says it posted the fastest times across all seven training benchmarks, including an 8,192-GPU run, in results worth reading as a vendor’s own scorecard.
- Qualcomm bets on 40+ AI wearables as the post-smartphone platform. With two new hardware platforms, Qualcomm is positioning itself as the foundational silicon for whatever form factor finally displaces the phone.
- OpenAI’s GPT-Bidi-1 aims to fix voice mode’s turn-taking problem. A bidirectional audio model in testing would let ChatGPT listen and speak at once, absorbing interruptions instead of freezing mid-response.
- Microsoft extends Phi Silica to NVIDIA GPUs. Microsoft is piloting its on-device small models on NVIDIA RTX hardware, widening local AI reach beyond the NPUs of Copilot+ PCs.
Today’s Quick Hits
- The benchmark ceiling: why frontier evals need a rebuild. As top models saturate standard tests, measuring real progress at the frontier is getting harder, and the eval playbook is overdue for a rewrite.
- Anthropic pulls back on its Agent SDK billing split before it hit. Hours before a pricing change would have separated SDK-driven Claude usage from standard API billing, Anthropic reversed course and says it is rethinking how subscriptions serve builders.