OpenAI taped out a custom inference chip while Amazon quietly locked in a decade-long power lead, Anthropic accused Alibaba of running 28.8 million illicit model queries and then watched two more of Google’s top researchers walk through its doors, and a wave of new agents stopped demo-ing and started doing real work.
The Compute Land Grab: Owning Silicon, Power, and the Full Stack
Two moves this week redrew the infrastructure map: one lab taped out its own chip, while the cloud giant with the deepest real estate in data centers quietly extended its runway to 2030.
- OpenAI and Broadcom build Jalapeño, a custom LLM inference chip. A nine-month co-development sprint from blank slate to tape-out signals OpenAI’s intent to own its compute destiny, not just rent it from Nvidia. The chip is purpose-built for inference, the cost center that scales with every paying user.
- Amazon Holds the Power Lead in AI Infrastructure Through 2030. Two decades of data-center construction give Amazon a structural head start in the electricity-constrained race to scale AI compute, with Google closing ground but not catching up. In a world where power is the binding constraint, prior investment compounds.
The US-China Trust Fracture: Talent, Theft, and Senate Testimony
Anthropic brought receipts to Capitol Hill and poached more of Google’s best researchers in the same week, turning a slow-burning rivalry into a documented confrontation.
- Anthropic tells Senate Alibaba ran 28.8M illicit model queries. A June 10 letter to Senate Banking accuses Alibaba-affiliated operators of the largest known distillation attack against Anthropic, using 25,000 fraudulent accounts over six weeks. The 28.8 million query figure is now part of the congressional record.
- Two More Gemini Architects Head to Anthropic. Jonas Adler and Alexander Pritzel, who helped build Google’s flagship model, are the latest in a string of senior researchers to leave for a competitor. The steady exit of Gemini architects to Anthropic is becoming a pattern Google will have to address structurally.
Agents at Work: Automation Enters Real Workflows
This week’s agent news is less about capability announcements and more about products crossing the threshold from demo to deployment, covering everything from browser control to legal back-offices.
- Google brings computer use into Gemini 3.5 Flash. The lightweight model can now click, scroll, and type across desktop and browser environments, making agentic automation affordable at scale. Putting computer use in a cost-optimized model changes the economics of deploying it in high-volume workflows.
- Alibaba’s Qwen team built a model that simulates AI agents, not just runs them. Qwen-AgentWorld trains on 10 million interaction trajectories to create a language-based world model that can stand in for real environments during agent training. The result is cheaper, faster iteration on agent behavior without needing live system access.
- Stably ships Orca, an open-source IDE built for fleets of coding agents. As developers routinely run five or more coding agents in parallel, Stably’s Orca offers a dedicated orchestration layer to manage them all from one desktop app. The tool treats multi-agent coding as an infrastructure problem, not a workflow one.
- Perplexity Targets Legal Ops with Computer for Counsel. The search-to-answer company enters vertical enterprise AI, betting legal back-offices are ready for automation before the bar is. Targeting legal operations is a deliberate wedge: high document volume, low tolerance for error, and deep institutional pain.
Open Weights Closing In: Models That Work Outside the Lab
Two open-weight developments this week matter not for benchmark scores but for what they actually do inside real developer workflows.
- GLM-5.2 is the first open model that holds up in a coding agent harness. Z.ai’s latest release crossed a threshold that prior open-weight models missed: it works as a general agent inside real coding workflows, not just benchmarks. That distinction matters for teams evaluating whether they can run capable agents on self-hosted infrastructure.
- NVIDIA’s NeMo AutoModel cuts MoE fine-tuning cost with one import swap. A new open library layers Expert Parallelism and fused communication kernels on top of Hugging Face Transformers v5, claiming 3.4 to 3.7x faster training at 29 to 32% lower GPU memory. The one-import-swap design lowers the barrier for teams already in the Hugging Face ecosystem.
The Browser Wars: Who Controls the Shopping Layer
Amazon’s lawsuit against Perplexity’s Comet browser reframes a technical dispute as a security threat, but the real fight is over first-party commerce intent data.
- Amazon vs. Perplexity Is a Fight Over Who Owns Your Browser. Amazon’s lawsuit against Comet reframes a user-agent identity question as a security threat, but the real dispute is over who controls the shopping experience. The outcome will set a precedent for whether AI-native browsers can route around incumbent platform controls.
Quick Hits
- OpenAI Quietly Upgrades GPT-5.5 Instant Inside ChatGPT. The update ships to free and paid users with promises of more natural responses, but OpenAI released no benchmark data alongside the claim.
- Mirendil Raises $200M Seed to Bring AI to Scientific Research. The Anthropic-alumni startup secured one of the largest seed rounds on record to build AI tools that help scientists develop their own models.