Introspection, a startup founded by former xAI agent-infrastructure engineers, is building tooling for what its team calls autoresearch: an outer loop of agents that maintain and improve a primary system using evals, feedback signals and human input. Co-founder and CEO Roland Gavrilescu described the approach in an interview with Latent Space published ahead of his “Autoresearch in the Wild” talk at the AI Engineer World’s Fair.

Gavrilescu and co-founder Julian Bright met at xAI, where they worked on agent infrastructure and cloud agents before leaving to start Introspection. Gavrilescu told Latent Space they saw a new agent form factor that xAI’s environment did not let them pursue directly, and looked instead at what made Cursor and Cognition work as products.

The company’s core idea separates two loops. The inner loop is the primary agent system doing the work: writing code, serving users, executing tasks. The outer loop is a second system that studies the inner loop and decides how to improve it, without burning excessive tokens figuring out what to fix. That second loop, Gavrilescu argued, is now the product, not the underlying model.

Introspection is proposing a unit it calls an agent recipe, modeled on the data recipes used in model post-training that specify how much data from each domain gets baked into a model. An agent recipe instead bundles the harness configuration, the evals, the judges and the accumulated human expertise that shaped a system’s current state, along with the failure history that produced each addition. Gavrilescu’s example: handing someone the Devin codebase today would tell them little without the record of the mistakes and decisions that produced it. The recipe is meant to be that record, in a format portable across model providers.

The infrastructure runs on Pi, an open-source agent harness that Gavrilescu compared to Linux: not intended to run unmodified, but designed to be extended with different configuration files loaded into the same runtime. Introspection’s pitch is to pair that extensibility with recipes and managed production infrastructure covering cost control and security, capabilities Gavrilescu said exist inside frontier labs but are not yet packaged for outside companies.

Humans stay explicitly in the loop under this design. Gavrilescu described an “ask a human” tool that lets agents query people directly when uncertain, comparing it to a new employee who asks many questions early on and gradually needs fewer as the system accumulates preferences. This is a structural claim, not a hedge: Introspection is arguing that autonomy has to be earned through repeated interaction with human judgment before a system can operate with less supervision, not assumed as a starting condition.

The company’s initial customers are software engineers at vertical SaaS companies who want agents working inside their own Git repositories without depending on a single model provider. Everything runs Git-based, Gavrilescu said, so the repository itself becomes the audit trail. That is a narrower bet than a general-purpose “autonomous software factory,” and Gavrilescu pushed back on the idea that full autonomy is achievable at launch: models lack the tacit organizational knowledge to run a factory unsupervised on day one, he said, so the near-term architecture looks more like an orchestra with a human conductor than a fully automated production line.

Latent Space’s interview does not include usage numbers, funding figures or named customers for Introspection, so those claims cannot be evaluated independently yet. The company’s advice to engineers experimenting with autoresearch on their own centers on three things: investing in signal quality before scale, capping the cost of unsupervised agent loops, and studying how frontier labs structure their own research harnesses.

For teams already running coding agents in production, the practical next step is auditing whether feedback from those agents currently reaches any structured eval or judge, or whether it disappears after each session. Introspection’s bet is that the companies capturing that signal now will own a compounding advantage once agent recipes become a standard unit of comparison.

Reported by Latent Space, based on an interview conducted at the AI Engineer World’s Fair.