Ramp has built a private coding benchmark derived entirely from engineering problems encountered inside its own financial-software stack, the company announced late last week on X. The eval gives Ramp’s engineering team a production-grounded way to compare coding models against work that actually reflects their codebase and domain constraints.

Public SWE-Bench has a contamination problem. Model providers train against its test cases, and leaderboard positions no longer reliably predict how a model performs on proprietary code. Ramp’s move is the logical response: if public evals are gameable, build one that isn’t.

Engineering leaders evaluating coding models should treat this as a template. A private benchmark built from your real bug backlog and internal tooling will surface capability gaps that SWE-Bench cannot. The cost of building one is low; the signal gain over public leaderboards is not.

Ramp announced the private benchmark on X on June 13, 2026.