Math shows AI's progress is spiky, not smooth, says Grant Sanderson

The 3Blue1Brown creator argues AI's split performance on math olympiad problems previews how automation will hit the wider economy unevenly.

Alessandro Benigni

PUBLISHED JUL 1, 2026

3 MIN READ

Follow on Google

2 HR AGO

$Math shows AI's progress is spiky, not smooth, says Grant Sanderson — featured image for AI Insiders$

AI models now solve International Math Olympiad geometry problems in under twenty seconds, yet still stumble on the same competition’s combinatorics questions. That gap, according to 3Blue1Brown creator Grant Sanderson, is not a curiosity confined to math contests. It is a preview of how AI capability will spread through the rest of the economy: fast and total in some domains, slow and partial in others, with little warning about which is which.

Sanderson made the case on a podcast with Dwarkesh Patel, where he discussed a project he is building to document AI’s progress in mathematics specifically because, in his words, the field is producing the fastest and clearest signal of where AI capability is heading. The IMO splits its six annual problems across four categories: geometry, number theory, algebra, and combinatorics. Sanderson said models have effectively cold-solved geometry through what amounts to a brute-force search method, the same shortcut some human competitors quietly rely on. Models would have taken gold at the 2024 olympiad, he said, if that year’s test had not happened to include two combinatorics problems instead of the usual mix weighted toward the other three categories.

Combinatorics is different because it rewards what Sanderson called playful, puzzle-like thinking rather than a systematic procedure. Geometry problems tend to yield to grinding through known techniques. Combinatorics problems are often designed by their authors specifically to resist that kind of training, demanding a genuine conceptual leap instead of pattern-matching against solved cases. That distinction, brute-forceable structure versus open-ended creative reasoning, is the throughline Sanderson uses to reason about which jobs and tasks AI will absorb first.

The framing matters because it complicates the assumption that AI progress arrives as a single wave. Patel, in the same conversation, referenced asking Sanderson three years earlier whether an IMO gold medal would effectively mean AGI. Sanderson’s answer then, which he said held up, was that gold would be just another benchmark cleared, not a discontinuous moment. Nothing in the intervening years produced that “aha” moment either. Instead, capability has advanced along what Sanderson described as a fractal, spiky frontier: zoom into any one field and the unevenness repeats at a smaller scale.

For operators, the practical takeaway is that “AI is good at math” or “AI is good at coding” are both too coarse to plan around. The geometry-versus-combinatorics split suggests the real dividing line runs through problem structure rather than subject label: tasks with a known solution procedure that can be searched or trained against fall quickly, while tasks whose difficulty is that no reliable procedure exists resist much longer. Sanderson noted that the IMO’s own designers try to write problems specifically immune to that kind of training, and models still miss a meaningful share of them.

Sanderson also raised the question of who benefits once an AI system generates mathematics faster than humans can verify or curate it. He argued that if AI produces a large volume of new mathematical results, the economic value shifts to whoever can judge which results matter and point the system toward useful directions, a curation role rather than a production one. That argument extends beyond math: as generation gets cheap in any domain, judgment about what to generate and why becomes the scarcer skill.

Operators evaluating where to deploy AI in the next quarter should audit their own task list the way Sanderson audits IMO categories: separate the “geometry” tasks, ones with a known procedure a model can search or brute-force, from the “combinatorics” tasks that require a genuine conceptual leap. The first group is a near-term automation target. The second group is where human judgment retains value longer than the current hype cycle suggests, and where curation, not generation, becomes the higher-leverage role.

Reported by Dwarkesh Patel on June 26, 2026.

Math shows AI's progress is spiky, not smooth, says Grant Sanderson

The morning brief for people inside the AI industry.

More in Opinion

Anthropic starts an internal drug discovery program

Why generalist AI models keep losing to specialists

Moondream's Photon engine hides the GPU's idle wait, cuts it out