Flash Attention

Data movement between HBM and on-chip SRAM during Flash Attention (Dao et al. 2022). Score and probability matrices never leave SRAM — every float transferred is tallied.

Compare with Standard Attention

Interactive Visualization

Every square represents one float. Unlike standard attention, the N×N score (S) and probability (P̃) matrices live and die entirely inside SRAM — they never touch HBM.

Phase 1: Initial State
Ready
X (Input)
Q / Wq
K / Wk
V / Wv
S tile (SRAM only)
P̃ tile (SRAM only)
O (output)
m (row-max stats)
ℓ (row-sum stats)
S and P̃ tiles exist only inside SRAM (scratch slot) — they are never written to HBM. This eliminates the O(N²) memory traffic that dominates standard attention.