ServiceNow ships EVA-Bench 2.0 with 121 tools and 213 scenarios

The open-source enterprise agent benchmark now spans airline, IT, and healthcare workflows, a scale that reliably breaks brittle agent frameworks.

Alessandro Benigni

PUBLISHED JUN 6, 2026

1 MIN READ

Follow on Google

-1100 MIN AGO

ServiceNow ships EVA-Bench 2.0 with 121 tools and 213 scenarios — featured image for AI Insiders

Most public benchmarks for enterprise agents test narrow, synthetic tasks. ServiceNow Research published EVA-Bench Data 2.0 on Hugging Face on June 4, expanding its evaluation set to three domains: Airline Customer Service Management, Enterprise IT Service Management, and Healthcare HR Service Delivery. Together they cover 213 scenarios across 121 tools, roughly four times the scope of the original release.

The scale matters because agent failures in enterprise deployments are domain-specific. A framework that handles flight rebooking without errors can break on FMLA policy lookups. The 121-tool surface area, with adversarial calls, multi-intent conversations, and unsatisfiable user goals included, is the kind of fan-out that exposes where orchestration logic goes brittle.

The commercial alignment is obvious: ServiceNow benefits if the canonical enterprise agent benchmark maps to ServiceNow’s own workflow topology. Read the scores with that in mind. The underlying dataset is still open under MIT and structured for drop-in use with standard evaluation harnesses, which gives it practical utility beyond ServiceNow’s leaderboard.

Teams evaluating enterprise voice or tool-calling agents should run their stack against this before committing to an architecture for customer-facing deployments.

ServiceNow Research on Hugging Face (huggingface.co/ServiceNow-AI), published June 4, 2026.

ServiceNow ships EVA-Bench 2.0 with 121 tools and 213 scenarios

The morning brief for people inside the AI industry.

More in Wire

A zero-dependency CLI for picking the right local model

Alibaba publishes the distillation recipe, not just the model

Ideogram releases open-weight image model built on JSON prompts