Ramp built a private SWE-Bench from its own production bugs

Ramp's in-house coding benchmark, drawn from real financial-software problems, is how the company now selects coding models rather than trusting public leaderboards.

Alessandro Benigni

PUBLISHED JUN 16, 2026

1 MIN READ

Follow on Google

-1007 MIN AGO

Ramp built a private SWE-Bench from its own production bugs — featured image for AI Insiders

Ramp has built a private coding benchmark derived entirely from engineering problems encountered inside its own financial-software stack, the company announced late last week on X. The eval gives Ramp’s engineering team a production-grounded way to compare coding models against work that actually reflects their codebase and domain constraints.

Public SWE-Bench has a contamination problem. Model providers train against its test cases, and leaderboard positions no longer reliably predict how a model performs on proprietary code. Ramp’s move is the logical response: if public evals are gameable, build one that isn’t.

Engineering leaders evaluating coding models should treat this as a template. A private benchmark built from your real bug backlog and internal tooling will surface capability gaps that SWE-Bench cannot. The cost of building one is low; the signal gain over public leaderboards is not.

Ramp announced the private benchmark on X on June 13, 2026.

Ramp built a private SWE-Bench from its own production bugs

The morning brief for people inside the AI industry.

More in Wire

NVIDIA's Blackwell Ultra handles 20x more agents per megawatt than Hopper

Kernel fusion is where PyTorch inference speed actually hides

NVIDIA ships open-source scanner for agent skill supply-chain risk