Ramp, the corporate-card and finance-automation company, posted findings on May 28 from an experiment in which roughly 10,000 Inspect coding-agent sessions ran in parallel against its own production backend over an 8-hour window. The prompt was minimal: find security issues. The output was a non-trivial set of high-severity findings.
The experiment is methodologically interesting. Standard security scanning tools (Snyk, Semgrep, Socket, Bumblebee, which we covered May 22) apply rule-based pattern matching. AI-based security review, when run as a single session, produces output of variable quality and miss rates that depend on the prompt and model choice. Ramp’s approach was to scale horizontally: run thousands of independent sessions, each with a fresh model context, and aggregate the findings that recur across many sessions. The pattern that emerges from parallelism is a proxy for confidence in the underlying issue being real.
The practical question is whether the cost-to-find ratio for scaled AI security review is now competitive with traditional static analysis. At current API rates, 10,000 sessions across an 8-hour window is a meaningful cost. The Ramp post does not disclose the absolute figure, but back-of-envelope estimates put it in the range of a single senior security engineer’s monthly compensation. If the findings include issues that standard scanners did not surface, the trade is worth it.
For internal security teams now evaluating whether to add scaled AI red-teaming to their CI pipeline, the Ramp experiment is the most concrete published demonstration of the pattern.
Posted by Ramp Labs on X on 2026-05-28.