The Cost Reckoning Lands While The Stack Argues About Memory

Anthropic’s S-1 hit the same week Bain told the market that 40% of enterprise AI spend is not paying back, and the cost-side critique now has receipts. Underneath, the architectural conversation moved too: three independent pieces argued that the memory layer in production agent harnesses is the wrong abstraction, with a 57 to 71% cross-user contamination number to prove it.

The Cost Reckoning Meets The IPO Calendar

The day after Anthropic filed its draft S-1, the corporate cost backlash got specific, the open-source bear case got published, and Vercel made an uncomfortable observation about where part of the bill is going.

The bill arrives as Anthropic files for its IPO — A Bain survey of nearly 1,000 companies found 40% reporting AI cost savings below 10%. One CFO accidentally spent half a billion dollars on Claude in a single month. The IPO is colliding with its own customer-side economics.
Two AI economies, two exponentials, one S-1 to price them both — Nathan Lambert argues open and closed models are on different exponentials, not different points of the same curve. The open economy’s total market value will exceed OpenAI and Anthropic combined. That is the structural bear case the IPO will have to answer.
Vercel’s own docs site got hit with an inference theft attack — Attackers find exposed AI endpoints and resell the inference at a markup. Traditional rate limits do not stop it. Some of the corporate AI bills now include theft nobody priced in.

Memory Becomes The Argument

A Sentra essay, a Mem0 audit, and a research-stage attention variant landed within hours of each other, and they all argued that the unstructured-context-window era is ending.

The boulder problem and why AI memory systems are built on the wrong abstraction — Sentra’s Ashwin Gopinath argues memory is not a service, it is state. The same customer email becomes a different artifact for sales, product, legal, finance, and the executive. Freezing ontology at ingestion traps everything downstream inside a frame that was prematurely right.
Memory leaks in agent harnesses hit 57 to 71% cross-user contamination — Mem0 surveyed eight production agent harnesses, from Claude Code and Codex to Devin and Bedrock AgentCore. The headline is the contamination rate. The structural finding is that every harness ships keyword retrieval and bounded local storage, which is the wrong architecture for the next layer of work.
Tilde Research open-sources an attention variant built for long-context reasoning — Wall Attention organises long-context information around persistent memory tokens that get dedicated compute, instead of letting attention scores diffuse across a million tokens. It is the architectural answer to the same critique Sentra and Mem0 made above.

Coding Agents Press On Every Surface

OpenAI extended Codex out of engineering, Microsoft launched seven models aimed at customer-tunable inference, and GitHub’s COO told Latent Space that the platform was not designed for what is now running on it.

OpenAI pushes Codex into six white-collar verticals — Six new plug-ins for data analytics, creative production, sales, product design, equity investing, and investment banking. 62 apps, 110 skills. Codex is no longer just a coding agent. The procurement target is now the department head, not the CTO.
Microsoft’s MAI bets enterprises will tune their own models — Seven new MAI models, a Mayo Clinic healthcare partnership, and Frontier Tuning, which lets developers tune model weights against their own reinforcement learning environments. The strategic posture is clear: cheaper inference, more control, less OpenAI dependency.
GitHub’s infrastructure was not built for a 1,400% spike in agent-shipped code — GitHub COO Kyle Daigle told Latent Space that agent-shipped code grew 1,400% in 2026. The fork-and-PR loop, calibrated for human cadence, is breaking. Daigle’s pivot: branches become tasks, agents become first-class principals, the repo becomes the runtime.

The Frontier Keeps Moving

MiniMax committed a 10-day weights release timeline, Perplexity inverted the cloud-only assumption, and a16z made the architectural case for code as output across design and 3D.

MiniMax M3 ships API-first with weights still pending — 1M-token context, native multimodality, frontier coding, and $0.60 per million input tokens. Weights and a technical report ship within 10 days. The first open-weights model that genuinely competes on the dimensions enterprises actually procure on.
Perplexity turns the AI PC into a cloud traffic controller — Aravind Srinivas demoed Perplexity’s hybrid inference orchestrator with Intel CEO Lip-Bu Tan onstage at Computex. The pitch: sensitive workloads stay on the device, the cloud handles the rest, the AI PC finally gets a real reason to exist.
Visual AI’s real leap is generating code, not pixels — a16z argues the visual AI shift is from pixel output to source code. A code-generated UI is a real component. A code-generated 3D scene is parametric. The artifact is now structured, which means it can be edited, version-controlled, and iterated on without re-running the model.

Mythos Quietly Scales

Anthropic has built a private, vetted distribution channel for a model it has publicly declined to release.

Anthropic’s private Mythos channel scales to 150 orgs across 15+ countries — Project Glasswing partners have collectively found over 10,000 high or critical-severity vulnerabilities. The new expansion targets critical infrastructure sectors: power, water, healthcare, communications, hardware. The product story is impressive. The governance question is who decides which 150 organisations qualify.

Today’s Quick Hits

TinyFish open-sources Bigset, a prompt-to-dataset agent — Natural-language prompt in, structured dataset from the live web out. The open-source alternative to enterprise data acquisition vendors.