Cerebras clocks Kimi K2.6 at ~1,000 tokens per second

Artificial Analysis says the trillion-parameter Moonshot AI model is the fastest frontier model ever measured, running on Cerebras hardware.

Alessandro Benigni

PUBLISHED MAY 20, 2026

1 MIN READ

Follow on Google

MAY 20, 2026

Cerebras clocks Kimi K2.6 at ~1,000 tokens per second — featured image for AI Insiders

Cerebras measured Moonshot AI’s Kimi K2.6 at roughly 1,000 tokens per second in enterprise trials, a figure Artificial Analysis says is the highest throughput ever recorded for a frontier-class model. The announcement came via a Cerebras post on X on May 19, 2026.

Kimi K2.6 is a trillion-parameter model from Moonshot AI, the Beijing-based lab. Cerebras, which builds wafer-scale inference chips designed specifically for high-throughput generation, is hosting the model for enterprise customers. Artificial Analysis, the independent inference benchmarking service, supplied the throughput figure.

The number matters for agent builders because token generation speed directly controls how long a multi-step reasoning chain takes to complete. At 1,000 tokens per second, a 32,000-token agent loop that might take two minutes on a GPU cluster finishes in under a minute. Groq, the other specialist inference chip company that has made speed a core claim, does not currently list Kimi K2.6 in its model catalog.

Cerebras has not disclosed pricing for the enterprise trial, so whether the throughput advantage translates to lower cost-per-useful-output remains unconfirmed. Teams benchmarking long-context agent pipelines should add Cerebras-hosted K2.6 to their evaluation runs before committing to an inference provider for the second half of 2026.

Reported by Cerebras via X thread (https://threadreaderapp.com/thread/2056778123329274279.html) and benchmarked by Artificial Analysis, 2026-05-19.

Cerebras clocks Kimi K2.6 at ~1,000 tokens per second

The morning brief for people inside the AI industry.

More in Wire

Apple's Vision Pro chief reportedly joins OpenAI's hardware team

Midjourney Pivots to Medical Hardware With Full-Body Ultrasonic Scanner

NVIDIA's Blackwell Ultra handles 20x more agents per megawatt than Hopper