Standard Attention — Visualization

Interactive Visualization

Every square represents one float. Transfers are accounted for square-by-square through the memory controller lanes. SRAM shows the actual blocks loaded for each computation step.

Phase 1: Initial State

Ready

X (Input)

Q / Wq

K / Wk

V / Wv

S (scores)

P (softmax)

O (output)

S and P are the large N×N matrices that dominate HBM traffic in standard attention. Every square transferred is fully accounted for in the tally.