OpenAI and Broadcom build Jalapeño, a custom LLM inference chip

The nine-month co-development sprint from blank slate to tape-out signals OpenAI's intent to own its compute destiny, not just rent it from Nvidia.

Alessandro Benigni

PUBLISHED JUN 26, 2026

3 MIN READ

Follow on Google

-504 MIN AGO

OpenAI and Broadcom build Jalapeño, a custom LLM inference chip — featured image for AI Insiders

OpenAI and Broadcom have taped out Jalapeño, a custom silicon accelerator built from scratch for large language model inference, with scale deployment targeted for late 2026. The joint announcement, published by OpenAI on June 24, marks the first concrete hardware output of a partnership the two companies had previously signaled but not detailed.

The chip was not adapted from a general-purpose accelerator. According to the OpenAI and Broadcom announcement, Jalapeño was designed specifically around how LLMs actually behave at inference time: the bottlenecks of data movement, the imbalance between compute density and memory bandwidth, networking efficiency at the system level, and overall workload behavior across large deployments. That is a meaningfully different design philosophy from taking a training-focused GPU and tuning it for serving.

The timeline is the most notable operational detail. OpenAI and Broadcom say the chip went from initial design to manufacturing tape-out in nine months. The companies framed that as a possible record turnaround for an advanced, high-performance custom processor, and credited it partly to putting OpenAI’s own models to work on portions of the design. Tom’s Hardware described the part as a massive reticle-sized ASIC. The release announcement does not include independent benchmark results for that performance-per-watt claim, which OpenAI described as significantly better than current alternatives.

The strategic context matters more than the benchmarks. OpenAI currently runs the world’s most expensive inference operation, and almost all of it runs on Nvidia hardware. Custom silicon is the path Alphabet took with its Tensor Processing Units, the path Amazon took with Trainium and Inferentia, and the path Meta is building toward with its MTIA series. Every one of those companies now holds meaningful pricing leverage over their own AI workloads. OpenAI, until Jalapeño, did not. CNBC reported that OpenAI framed the chip explicitly as part of an effort to build the full stack.

Broadcom is not a passive manufacturer here. The company is one of the few in the world capable of co-designing and producing a reticle-limited ASIC at this speed, and the partnership is structured across multiple chip generations, not just this one. OpenAI and Broadcom said Jalapeño is the first chip in a multi-generation custom compute platform. That framing suggests OpenAI is building an internal silicon roadmap, not running a one-time experiment.

For builders running inference workloads at scale, the near-term implication is indirect: Jalapeño is not available to external developers. Its immediate effect is on OpenAI’s own cost structure. If the performance-per-watt advantage holds under production conditions, OpenAI can serve the same token volume at lower energy and infrastructure cost, which creates room to hold or reduce API pricing as GPU-based competitors continue to scale.

The broader signal is architectural. OpenAI is assembling the full hardware-software stack: its own models, its own inference software, and now its own silicon. Teams making multi-year infrastructure commitments to specific inference APIs should factor in that OpenAI’s cost floor is about to decouple from the Nvidia spot market.

Reported by OpenAI and Broadcom in a joint announcement on June 24, 2026, with corroborating coverage from TechCrunch, CNBC, and Tom’s Hardware.

OpenAI and Broadcom build Jalapeño, a custom LLM inference chip

The morning brief for people inside the AI industry.

More in Models

GLM-5.2 is the first open model that holds up in a coding agent harness

OpenAI Quietly Upgrades GPT-5.5 Instant Inside ChatGPT

Baidu's Unlimited OCR parses dozens of pages in a single forward pass