Cursor published a detailed post-mortem on May 21 describing a year of lessons building cloud agents at scale, and the architecture that emerged maps almost exactly onto Google’s recently announced Agent Executor runtime. Two independent teams, solving the same class of problems, arrived at the same four primitives. That convergence is the story.

Full development environments as a prerequisite. Cursor’s early cloud agents ran in thin containers that lacked the tooling a developer would have locally. The failure mode was subtle: no crash, no error, just degraded output quality that looked like a model limitation until traced back to missing environment setup. The fix required rebuilding what amounts to enterprise IT for agents, including secret redaction, network policies, credential management, VM hibernation pipelines, and checkpoint-restore infrastructure. As models have gotten more capable, the environment has become the binding constraint on whether they perform at full potential.

Durable execution for long-running tasks. A work-stealing architecture where worker nodes picked up and looped agents to completion worked fine locally but collapsed under the reliability demands of cloud. Early cloud agents ran at roughly one nine of reliability. Cursor migrated to Temporal, a durable execution framework that handles retries, cross-machine scheduling, and persistence across node failures. The migration pushed reliability past two nines. Temporal now processes more than 50 million actions per day across more than 7 million unique workflows for Cursor, and more than 40 percent of the company’s internal pull requests come from cloud agents. The team also moved from long-lived workflows to shorter ones that exit after a single task, making version upgrades substantially less painful.

Strict separation of agent state, machine state, and conversation state. When a single agent can spawn async subagents across multiple machines, or a subagent outlives its parent, keeping those three layers coupled becomes a reliability trap. Cursor decoupled them: the agent loop lives in Temporal rather than on any specific VM, pod lifecycles are managed independently, and the conversation layer uses an append-only storage mechanism with explicit retry semantics. If a step fails mid-stream, the client detects it, rewinds its stream, and replays the corrected output rather than showing stale data.

Self-healing infrastructure instead of static harness logic. The early approach hard-coded harness behavior for every edge case: force a commit here, grab CI failure logs there. As models improved, Cursor steadily moved logic out of the harness and into tools the agent controls directly. The current direction is giving agents instrumentation to diagnose their own environment, detecting missing secrets, blocked network access, or broken dependencies, and acting on those signals rather than silently degrading.

The analytical point worth sitting with is that these four patterns did not emerge from a single team’s idiosyncratic choices. Cursor built them iteratively over a year of production failures. Google shipped them as a defined runtime for third-party agent developers. The fact that two organizations converged on the same architecture from different directions suggests this is the correct substrate for long-running agent work, not one vendor’s opinion.

For teams running their own agent infrastructure, the checklist is direct: isolated full-stack development environments, a durable execution layer (Temporal or equivalent), explicit decoupling of the agent loop from conversation storage, and self-reporting instrumentation in place of defensive harness code. The first item to audit is usually the environment. Most teams discover their agents are running in something closer to a bare container than a developer workstation, and that gap explains more underperformance than the model does.

If your team is currently hitting reliability ceilings on long-running agent tasks, migrating the agent loop to a durable execution framework is the highest-leverage re-architecture to prioritize before adding further capability.

Published on the Cursor engineering blog on 2026-05-21.