Prompt engineering is losing its status as the primary skill in AI-assisted development. The practice that replaced it does not have a catchy name yet, but Addy Osmani, an engineering lead at Google Chrome, settled on one in June 2026: loop engineering. His synthesis, drawing on work from Boris Cherny at Anthropic and Peter Steinberger, shifts the unit of work from the instruction you type to the harness you build.
The mechanism is not complicated. A loop engineering system issues a prompt to a coding agent, receives the output, runs an evaluator against the result, and decides whether to stop or retry. The cycle repeats until a termination condition is satisfied. What looks like a single “AI wrote my code” interaction is actually dozens of generate-evaluate-retry passes running without a human in the loop.
The key structural move is separating the generator from the grader. In a well-designed loop, the model that writes the code is not the model that decides the work is done. A separate, typically smaller, evaluator model inspects the output against explicit criteria. This matters because a model asked to judge its own work suffers from an obvious conflict: it generated the code and has no incentive to surface its own mistakes. An independent evaluator with a narrow task, score this diff against the acceptance criteria, is harder to fool.
What loop engineering demands that prompt engineering did not is worth listing plainly. You need an eval layer: a grader that produces a machine-readable signal, not a human opinion. You need cost ceilings: loops without budget controls will churn through tokens until they hit a timeout or an invoice. You need stopping conditions that are precise enough to fire, but not so precise that they only fire when everything is already perfect. Writing a termination condition is closer to writing a test than writing a sentence.
Elvis Saravia, writing on X on June 20, 2026, surfaced and amplified Osmani’s framing and the underlying ideas it synthesized. The reception was immediate, which suggests the concept mapped onto something developers were already experiencing without the vocabulary for it.
The open risk, and it is a real one, is evaluator quality. A weak grader that consistently approves mediocre work does not slow down the loop. It speeds it up. The generator produces something plausible, the grader approves it, the loop exits, and the developer sees a green signal for code that actually contains a subtle bug. The harness pattern is only as reliable as the evaluator anchoring it. Teams building these systems now need to treat eval design as a first-class engineering task, not a configuration step.
The practical implication for any team currently using AI coding tools: the prompt you write to the agent matters less than the test you write to grade what it returns. If your workflow does not yet include an automated grader between the agent and the merge button, the loop is running on trust.
Based on Elvis Saravia’s post on X (June 20, 2026), synthesizing ideas from Addy Osmani’s writing on loop engineering, which drew on work by Boris Cherny at Anthropic and Peter Steinberger.