On April 12, 2026, traffic to Vercel’s documentation AI chat endpoint climbed to roughly ten times its normal volume, hitting 1,300 requests per minute at peak. The cost run rate at that moment exceeded ten thousand dollars per day. The attack lasted two days and standard per-IP rate limits had no meaningful signal to act on throughout.

Vercel’s CTO Malte Ubl and content engineering lead Eric Dodds documented the attack in a post on the Vercel engineering blog on May 29, 2026. The writeup is worth reading not as a product announcement but as a threat model update: inference theft has matured into a structured resale business, and the security assumptions most teams ship with are not designed to stop it.

The mechanics are specific. Attackers build an OpenAI-compatible adapter on top of a victim’s endpoint, a one-time engineering cost that makes the stolen calls droppable into any standard SDK or coding agent. They then fan requests through residential proxy pools, which disperses traffic across thousands of apparent IP addresses. Per-IP rate limits see normal-looking single-source traffic. The adapter holds the attacker’s customer session; by the time a call hits the real endpoint, it has already crossed the authentication boundary the developer was relying on. A check that runs once at session start gets amortized across every subsequent stolen call.

The Vercel post names a public example. A project called Chipotlai Max ships a proxy that routes through Chipotle’s customer-support chatbot and presents it as an OpenAI-compatible endpoint. The project’s GitHub page openly asks contributors to port the same approach to Home Depot, Lowe’s, Target, and Starbucks. The resale math works because frontier model inference costs roughly $2 per prompt while the marginal cost of resale approaches zero; selling access at five to ten percent of list price still produces a high-margin product.

This connects directly to a pattern finance teams have started flagging. When a CFO sees an anomalous monthly Claude or OpenAI bill, the diagnostic instinct is to audit internal usage: which teams, which workflows, which models. Inference theft introduces a third category of spend that internal audits miss entirely because the API key is being used as intended, just not by the people who are authorized to use it.

Vercel’s defense is their BotID product, powered by Kasada, which uses client-side machine learning to classify each request as human, bot, or agent without a visible challenge. On the April attack, BotID blocked more than ten thousand bot requests within the first minutes of the spike; volume returned to normal within twenty-four hours. The key architectural constraint is that the check must run per request, not per session. A per-session gate costs the attacker one bypass; a per-request gate costs one bypass per call, which collapses the economics of bulk resale.

The implementation is a few lines in a Next.js route handler: call checkBotId() before the AI SDK call, return 403 if the classification is bot. The client-side counterpart registers the protected path so BotID attaches challenge headers to the request. The Vercel docs cover the next.config.ts wrapper for the full setup.

The defense is not complete on its own. BotID is a Vercel-platform feature, so teams deploying on AWS Lambda or Cloudflare Workers need a comparable per-request verification layer. The post does not evaluate alternatives or provide independent data on false-positive rates, and the attack described was on Vercel’s own infrastructure, so the numbers reflect Vercel’s specific traffic patterns and attacker profile.

Teams that have shipped AI features to production without per-request bot classification should audit their endpoints now, starting with anything that gives callers flexible prompt control: playgrounds, general-purpose chat, and documentation assistants with loosely constrained system prompts.

Vercel engineering blog (vercel.com/blog), 2026-06-02.