Tilde Research published Wall Attention under an MIT license on June 2, making the reference Triton kernels and training recipe publicly available on GitHub.

The mechanism bakes a per-channel, per-timestep decay directly into the attention score calculation. Standard softmax attention treats every token pair with equal structural footing; Wall Attention assigns each query channel an independent, content-dependent rate at which earlier positions fade. Setting the decay to zero recovers vanilla attention, which makes the implementation a drop-in research baseline rather than a fork.

The repo ships two kernels: a fused forward and backward training kernel and a single-step decode kernel that reads a pre-rescaled KV cache so autoregressive generation avoids recomputing the full prefix on each token.

This sits squarely in a broader architectural argument visible across today’s research: unstructured context windows are not a permanent solution to long-range reasoning. Memory organized with explicit decay, retrieval primitives, or dedicated state slots consistently outperforms diffuse attention over millions of tokens. Wall Attention is the low-level kernel arm of that argument.

Labs building long-context fine-tunes should benchmark the decay variant against their baseline before committing to a next training run.

Tilde Research on GitHub (github.com/tilde-research/wall-attention-release), 2026-06-02.