Tilde Research open-sources an attention variant built for forgetting

Wall Attention adds per-channel multiplicative decay to the QK inner product, giving each query channel its own learned forgetting rate across long context.

Alessandro Benigni

PUBLISHED JUN 4, 2026

1 MIN READ

Follow on Google

-995 MIN AGO

Tilde Research published Wall Attention under an MIT license on June 2, making the reference Triton kernels and training recipe publicly available on GitHub.

The mechanism bakes a per-channel, per-timestep decay directly into the attention score calculation. Standard softmax attention treats every token pair with equal structural footing; Wall Attention assigns each query channel an independent, content-dependent rate at which earlier positions fade. Setting the decay to zero recovers vanilla attention, which makes the implementation a drop-in research baseline rather than a fork.

The repo ships two kernels: a fused forward and backward training kernel and a single-step decode kernel that reads a pre-rescaled KV cache so autoregressive generation avoids recomputing the full prefix on each token.

This sits squarely in a broader architectural argument visible across today’s research: unstructured context windows are not a permanent solution to long-range reasoning. Memory organized with explicit decay, retrieval primitives, or dedicated state slots consistently outperforms diffuse attention over millions of tokens. Wall Attention is the low-level kernel arm of that argument.

Labs building long-context fine-tunes should benchmark the decay variant against their baseline before committing to a next training run.

Tilde Research on GitHub (github.com/tilde-research/wall-attention-release), 2026-06-02.

Tilde Research open-sources an attention variant built for forgetting

The morning brief for people inside the AI industry.

More in Wire

TinyFish open-sources Bigset, a prompt-to-dataset agent

Cursor adds Premium seat and spending controls to Teams plan

Mistral ships Search Toolkit to unify RAG ingestion, retrieval, and eval