Cerebras measured Moonshot AI’s Kimi K2.6 at roughly 1,000 tokens per second in enterprise trials, a figure Artificial Analysis says is the highest throughput ever recorded for a frontier-class model. The announcement came via a Cerebras post on X on May 19, 2026.

Kimi K2.6 is a trillion-parameter model from Moonshot AI, the Beijing-based lab. Cerebras, which builds wafer-scale inference chips designed specifically for high-throughput generation, is hosting the model for enterprise customers. Artificial Analysis, the independent inference benchmarking service, supplied the throughput figure.

The number matters for agent builders because token generation speed directly controls how long a multi-step reasoning chain takes to complete. At 1,000 tokens per second, a 32,000-token agent loop that might take two minutes on a GPU cluster finishes in under a minute. Groq, the other specialist inference chip company that has made speed a core claim, does not currently list Kimi K2.6 in its model catalog.

Cerebras has not disclosed pricing for the enterprise trial, so whether the throughput advantage translates to lower cost-per-useful-output remains unconfirmed. Teams benchmarking long-context agent pipelines should add Cerebras-hosted K2.6 to their evaluation runs before committing to an inference provider for the second half of 2026.

Reported by Cerebras via X thread (https://threadreaderapp.com/thread/2056778123329274279.html) and benchmarked by Artificial Analysis, 2026-05-19.