Sakana Labs claims a block-wise training method that bypasses backprop

A diffusion-style forward pass on independent blocks reportedly cuts the memory needed to train deep networks, addressing the wall that frontier training has been hitting.

Alessandro Benigni

PUBLISHED MAY 30, 2026

1 MIN READ

Follow on Google

18 HR AGO

Sakana Labs published a result on May 29, summarized via an X post from David Ha, claiming a new training method for deep neural networks that breaks the network into independent blocks and trains each block locally, sidestepping the memory cost of holding the entire network during end-to-end backpropagation.

The technical claim is that the forward pass is treated like a diffusion model denoising a signal, which is what enables the block-wise local training. If the result holds at scale, the memory savings could be substantial: end-to-end backprop has been the binding constraint on training networks at the largest scales for a decade.

Two caveats are immediate. The announcement is a Twitter thread, not a peer-reviewed paper. Sakana Labs has previously published results that did not always reproduce cleanly outside their own lab. Independent replication on a standard benchmark is the verification step that matters here.

For frontier-scale training infrastructure teams, this is a result worth tracking but not yet a result worth re-architecting around. Watch for a paper, a code release, or an independent replication before reading it as a fundamental shift in how networks get trained.

Posted by David Ha on X on 2026-05-29.

Sakana Labs claims a block-wise training method that bypasses backprop

The morning brief for people inside the AI industry.

More in Models

Anthropic releases Opus 4.8 with effort controls and cheaper fast mode

Microsoft is reportedly building its own AI coding model

MiniMax teases M3 with sparse attention that runs 15.6x faster at long context