Meta’s superintelligence chief, Alexandr Wang, says the lab’s next frontier model has pulled even with OpenAI’s GPT-5.5 on benchmarks the industry tracks closely. The model carries the internal codename Watermelon and remains in training. Meta has given no timeline for when, or whether, it will ship.

The benchmark claim comes from Meta’s own leadership, not from an independent test or a published paper. That distinction matters: a lab reporting that its unreleased model matches a shipped competitor is a claim about internal evaluation runs, not a result the public can verify or reproduce.

Watermelon’s compute footprint is the more concrete data point. The model reportedly trains on an order of magnitude more compute than Muse Spark, Meta’s prior release. That jump signals Meta is willing to spend well beyond its last generation to reach frontier performance, a strategy that mirrors how OpenAI and Google DeepMind have scaled their own flagship models before launch.

Meta has repeatedly restructured its AI research organization over the past year, folding efforts into the superintelligence unit Wang now leads. A benchmark-parity claim from that unit functions as an internal progress signal as much as an external one: it tells Meta’s own researchers, and its investors, that the reorganization is producing frontier-caliber output before any product exists to prove it.

The absence of a launch date is itself informative. Frontier labs generally announce timelines once a model clears internal safety review and product integration work, not when it merely reaches benchmark parity. Watermelon’s status suggests Meta is earlier in that process than GPT-5.5 was when OpenAI made it generally available, even if the two models now score similarly on paper.

Benchmark parity also does not settle the questions that determine real-world usage: latency, cost per token, context handling, and how a model performs on tasks that do not resemble a leaderboard question. Meta has not disclosed any of those figures for Watermelon, and Lets Data Science’s report does not include them either. Nor has Meta said which benchmarks were run, how many attempts were allowed, or whether the comparison used GPT-5.5’s public API or a research configuration.

That gap between what was claimed and what was shown is worth naming directly: readers are being asked to trust a self-reported result from the team with the strongest incentive to report it favorably. The order-of-magnitude compute increase over Muse Spark is the one figure in this story that does not depend on Meta’s own scoring.

For operators, the near-term signal is competitive pressure rather than a product decision. Meta closing the gap on paper, even before shipping, raises the bar OpenAI and Google DeepMind must clear with their next releases. Teams evaluating model vendors should treat Watermelon as a compute-scale signal to watch, not a benchmark to plan around until Meta publishes independently verifiable results or opens access.

Reported by Lets Data Science on July 2, 2026.