An OpenAI reasoning model has autonomously disproved a conjecture tied to the planar unit distance problem, an open question in combinatorial geometry posed by Paul Erdos in 1946, according to an announcement published by OpenAI on May 20.
The unit distance problem asks how many pairs of points in a set of n points can be exactly one unit apart. Erdos conjectured specific bounds on this count. The OpenAI model produced a proof that violates those bounds, resolving a question that had resisted human effort for roughly eight decades.
The technique the model introduced draws from algebraic number theory: constructing algebraic number fields of large degree and small discriminant, using a Golod-Shafarevich criterion argument. That construction is not obvious. It builds on prior theoretical frameworks from Ellenberg-Venkatesh and Hajir-Maire-Ramakrishna, and applying it to the unit distance problem required a non-trivial creative step. Nine external mathematicians, including Noga Alon, W.T. Gowers, Will Sawin, and Melanie Matchett Wood, independently reviewed and verified the proof, according to the paper posted to arXiv.
OpenAI describes this as a significant milestone for AI-assisted mathematics. That framing deserves context. DeepMind’s AlphaProof and AlphaGeometry systems, announced in 2024, solved four of six International Mathematical Olympiad problems, including one at silver-medal difficulty, before any OpenAI model claimed a comparable result. DeepMind’s FunSearch, published in December 2023, discovered new solutions to the cap set problem, another longstanding combinatorics question. OpenAI’s result is meaningfully different in that the target conjecture is a named open problem from the research literature rather than a competition problem, but the claim that an AI has “resolved a prominent unsolved mathematics problem” for the first time is not accurate against the documented record.
What is new here is the mode of verification: nine credentialed external mathematicians reviewed an AI-generated proof of a research-level conjecture and confirmed it. That is a higher bar than benchmark performance, and it matters. The model did not merely check a known proof or optimize a known approach; it introduced a technique that human reviewers found correct and novel enough to warrant a paper listing them as co-authors.
The specific model name OpenAI used for this work was not disclosed in the announcement. That omission makes it difficult to reproduce the result, calibrate it against public model versions, or estimate whether the capability generalizes to other open conjectures.
For AI research teams evaluating reasoning models, this result suggests that current frontier models can contribute to original mathematics at a level that satisfies peer review, not merely pass math benchmarks. The distinction between “solves benchmark problems” and “produces verifiable novel proofs” is the threshold that matters for scientific credibility, and this result sits on the far side of it. Teams building math-adjacent research tools should treat external mathematician verification, not internal benchmark scores, as the standard worth targeting.
Reported by OpenAI on 2026-05-20.