The most important structural shift in AI right now is not a new model. It is the emergence of a common substrate that makes models interchangeable. Vipul Ved Prakash, co-founder and CEO of Together AI, made that case on X on June 30, framing the Transformer architecture and the inference API layer as the connective tissue that is pulling the AI stack apart at its seams.
The argument maps cleanly onto prior platform disaggregations. When TCP/IP standardized packet routing, it stripped value from proprietary networking hardware and handed it upward to the application layer. When x86 became the default compute substrate, IBM’s vertical integration unraveled and the margin moved to Microsoft and then Intel. Pak’s framing suggests AI is now crossing a similar threshold: the Transformer is the standard, inference APIs are the socket, and the stack above and below that socket is where the real competition is playing out.
The losers in this model are the vertically integrated players who built their moats on proprietary end-to-end stacks. When the serving layer commoditizes and open-weight models can run on standard inference APIs without meaningful quality penalty against closed systems, the premium for owning everything from chip to chat interface compresses. Closed labs that priced their advantage on inaccessibility now face a permanent cost ceiling set by the open-weight alternative.
The winners are more dispersed. Chip designers who can serve inference workloads at lower cost per token capture margin that was previously absorbed by the model layer. Open-weight model developers gain distribution they could not have built on proprietary serving rails. Application builders gain the ability to swap underlying models on price or capability without rearchitecting their product. The value concentrates at the two ends of the stack: silicon and application, with the model layer under sustained commoditization pressure in the middle.
This is not a foregone conclusion. The history of platform disaggregations includes a long tail of cases where the software layer fought back. Oracle survived decades of commoditized x86 compute by making migration painful enough to retain pricing power. Closed AI labs with deep distribution and proprietary training data pipelines may find similar defensive surfaces. But the structural pressure Prakash identifies is real, and it compounds with every open-weight release that closes the capability gap.
The practical implication for builders is this: the inference API abstraction is now stable enough to build on, which means product decisions made today about which model to lock to are almost certainly wrong. Teams building AI products in the next ninety days should architect for model substitution from day one, not as a future optimization but as a first-order requirement. The cost of not doing so is not a refactor. It is a competitive exposure.
Vipul Ved Prakash of Together AI wrote on X on June 30, 2026.