Standard Attention

Data movement between HBM and on-chip SRAM during standard self-attention. Watch how every float is shuttled through the 4 memory controller lanes, square by square.

Compare with Flash Attention

Interactive Visualization

Every square represents one float. Transfers are accounted for square-by-square through the memory controller lanes. SRAM shows the actual blocks loaded for each computation step.

Phase 1: Initial State
Ready
X (Input)
Q / Wq
K / Wk
V / Wv
S (scores)
P (softmax)
O (output)
S and P are the large N×N matrices that dominate HBM traffic in standard attention. Every square transferred is fully accounted for in the tally.