Faster Models, Committed Compute, and Credentialed Images

Three threads run through today’s news: frontier inference is accelerating while getting cheaper to run, compute access is being packaged as a multi-year commercial product, and the infrastructure beneath agent workflows is consolidating into fewer, more structured layers.

Speed and Specialization: Frontier Models Get Faster and More Focused

Three model releases today share a common logic: performance at a lower cost per token, targeted at specific workloads rather than general leaderboard climbing.

Google launches Gemini 3.5 Flash with a 4x speed edge over rival frontier models — Google’s Flash-tier model beats Gemini 3.1 Pro on agentic benchmarks while running four times faster than comparable frontier models, and it ships today into Search, Android Studio, and six named enterprise integrations.
AllenAI ships OlmoEarth v1.1 with 3x compute reduction — Matches its predecessor on remote-sensing benchmarks while cutting inference costs by roughly two-thirds, making continuous planet-scale mapping feasible at a fraction of the prior budget.
NVIDIA open-sources LongLive 2.0 for real-time interactive video — NVlabs is releasing the full training and inference stack. Teams building long-form video products now have a production-grade reference architecture rather than a paper result to re-implement from scratch.

Compute Locks: Capacity Becomes a Structured Commercial Product

OpenAI is borrowing directly from cloud-provider playbooks to convert unpredictable compute access into a contractual guarantee, and a new generation of retrieval infrastructure is compressing the cost of the search layer that sits beneath it.

OpenAI launches Reserved Instances for AI compute — The Guaranteed Capacity program lets enterprise customers commit to one-, two-, or three-year compute blocks at a discount, a structure AWS and Google Cloud have used to lock in enterprise spend for years.
Six new Ettin rerankers displace the ms-marco-MiniLM baseline — A family of CrossEncoder rerankers from 17M to 1B parameters beats the long-dominant MiniLM baseline on MTEB and NanoBEIR. Teams still running ms-marco-MiniLM as a default have a concrete, benchmarked reason to swap.

Agent Plumbing: Context Formats and Cross-Harness Orchestration

Two agent infrastructure stories today point in the same direction: the tooling layer beneath autonomous coding workflows is consolidating around richer context formats and unified control planes.

Claude Code drops Markdown for HTML as its default context format — Anthropic’s internal write-up explains the switch: HTML can carry tables, SVG, and interactive controls that Markdown cannot, which matters specifically for spec documents, planning artifacts, and UI prototypes.
Warp launches Oz as a multi-harness control plane for cloud agents — Oz can orchestrate Claude Code, Codex, and Warp Agent from one interface, with cross-harness memory that persists organizational context across all three.

Provenance, Capital, and the Long Record: What Persists After the Hype

Three pieces today examine durability: whether cryptographic content credentials can actually move from policy to infrastructure, whether $370 billion in AI philanthropy will flow or stay pledged, and whether the model half-life narrative holds up when measured against actual release data.

OpenAI adopts C2PA and SynthID for AI image provenance — Embedding C2PA cryptographic credentials and Google DeepMind’s SynthID watermarking into image surfaces, converting content provenance from policy position into infrastructure.
$370 billion in AI philanthropy is real, but illiquid — The structural constraint is not the dollar amount. Pledges are not disbursements, and the institutions equipped to absorb that capital at speed are not yet built to match the incoming volume.
The “model half-life” myth does not survive contact with data — Release cadences have accelerated, but the claim that model relevance halves on a predictable geometric curve is a metaphor without numbers supporting it.

Today’s Quick Hits

Andrej Karpathy joins Anthropic for frontier R&D — Karpathy is leaving his education work temporarily to do research at Anthropic, one of the more consequential researcher moves of the year.
Cerebras clocks Kimi K2.6 at ~1,000 tokens per second — Artificial Analysis measured the trillion-parameter Moonshot AI model on Cerebras hardware as the fastest frontier model ever recorded.

Faster Models, Committed Compute, and Credentialed Images

Speed and Specialization: Frontier Models Get Faster and More Focused

Compute Locks: Capacity Becomes a Structured Commercial Product

Agent Plumbing: Context Formats and Cross-Harness Orchestration

Provenance, Capital, and the Long Record: What Persists After the Hype

Today’s Quick Hits

Get it by email instead.

AI Insiders

Faster Models, Committed Compute, and Credentialed Images

Speed and Specialization: Frontier Models Get Faster and More Focused

Compute Locks: Capacity Becomes a Structured Commercial Product

Agent Plumbing: Context Formats and Cross-Harness Orchestration

Provenance, Capital, and the Long Record: What Persists After the Hype

Today’s Quick Hits

The morning brief for people inside the AI industry.