Ramp pointed 10,000 coding-agent sessions at its backend in 8 hours

A minimal find-security-issues prompt across thousands of parallel Inspect sessions surfaced high-severity findings, demonstrating an emerging pattern in scaled red-team automation.

Alessandro Benigni

PUBLISHED MAY 29, 2026

2 MIN READ

Follow on Google

YESTERDAY

Ramp, the corporate-card and finance-automation company, posted findings on May 28 from an experiment in which roughly 10,000 Inspect coding-agent sessions ran in parallel against its own production backend over an 8-hour window. The prompt was minimal: find security issues. The output was a non-trivial set of high-severity findings.

The experiment is methodologically interesting. Standard security scanning tools (Snyk, Semgrep, Socket, Bumblebee, which we covered May 22) apply rule-based pattern matching. AI-based security review, when run as a single session, produces output of variable quality and miss rates that depend on the prompt and model choice. Ramp’s approach was to scale horizontally: run thousands of independent sessions, each with a fresh model context, and aggregate the findings that recur across many sessions. The pattern that emerges from parallelism is a proxy for confidence in the underlying issue being real.

The practical question is whether the cost-to-find ratio for scaled AI security review is now competitive with traditional static analysis. At current API rates, 10,000 sessions across an 8-hour window is a meaningful cost. The Ramp post does not disclose the absolute figure, but back-of-envelope estimates put it in the range of a single senior security engineer’s monthly compensation. If the findings include issues that standard scanners did not surface, the trade is worth it.

For internal security teams now evaluating whether to add scaled AI red-teaming to their CI pipeline, the Ramp experiment is the most concrete published demonstration of the pattern.

Posted by Ramp Labs on X on 2026-05-28.

Ramp pointed 10,000 coding-agent sessions at its backend in 8 hours

The morning brief for people inside the AI industry.

More in Tools

Judgment Labs publishes Agent Judge to fix long-context eval failures

Musk says SpaceX is shipping a custom C-based AI training stack soon

Delta Weight Sync cuts trillion-parameter RL training transfer by 1000x