Perplexity bets that models should write their own search plans

Perplexity Research published a paper on June 1 arguing that the standard retrieval-augmented generation architecture is structurally limited for complex queries, and proposing a replacement it calls Search as Code (SaC).

The core diagnosis is straightforward. Traditional RAG pipelines are monolithic: one retrieval call, one ranker, one reranker, the same sequence regardless of query complexity. That works adequately for simple factual lookups. For compound questions requiring multiple retrieval steps, it breaks down because the pipeline cannot adapt to what it finds mid-execution.

SaC inverts the architecture. Rather than fixing the retrieval sequence in advance, the approach exposes search primitives as a programmable SDK. The model generates code at query time, composing retrieval, filtering, ranking, and synthesis steps based on what the specific query requires. The code is the search plan, written fresh for each task.

Perplexity reported benchmark results on WANDR, a retrieval benchmark designed around compound questions that require multiple dependent retrieval steps. SaC outperformed the systems Perplexity tested against on that benchmark. The honest caveat is that WANDR is where Perplexity chose to publish results. The paper does not include head-to-head comparisons on simpler benchmarks where monolithic RAG performs adequately, and Perplexity’s research output is by definition promotional. Independent replication on standard RAG benchmarks has not appeared yet.

The compute framing is notable. Perplexity’s position is that SaC is more efficient than the alternative agentic-search pattern, which tends toward brute-force multi-call retrieval: fire multiple searches, collect results, synthesize. SaC plans the search before executing it, in theory spending fewer tokens on dead ends.

Developers familiar with LangChain or LlamaIndex will recognize the surface similarity: both frameworks offer composable retrieval chains. The distinction Perplexity is drawing is about when the composition happens. LangChain chains are defined at build time by the developer. SaC chains are written at runtime by the model. The tradeoff is higher compute per query in exchange for higher accuracy on hard queries, because the model can see intermediate results before deciding the next retrieval step.

The SaC framing has conceptual overlap with OpenAI’s tool-calling architecture and with the broader trend of models managing their own context. What Perplexity is adding is a retrieval-specific SDK with a programmable surface, rather than exposing generic tool calls. Whether that specificity produces measurably better results outside Perplexity-chosen benchmarks is the open question.

For teams running RAG in production, the practical division is this: simple, uniform queries do not benefit from SaC. The architecture adds latency and compute cost per query in exchange for accuracy on compound questions. Teams running agentic research workflows, competitive intelligence tools, or any product where users ask multi-step questions should pilot SaC against their current baseline before dismissing or adopting it on the strength of a single benchmark.

Based on research published by Perplexity Research on June 1, 2026, at research.perplexity.ai.

Perplexity bets that models should write their own search plans

The morning brief for people inside the AI industry.

More in Agents

Devin crossed a threshold: automation now triggers more sessions than humans

Microsoft's Copilot Super App Leaks Before Build 2026

NotebookLM's Canvas feature puts Google in the document-ownership fight