Register
Shared
L2 Cache
Global/Local
Higher bars = longer latency. Register spillage forces data into slower global/local memory.
Register
Fastest access, ~1 cycle
Shared Memory
Fast access, ~20-30 cycles
L2 Cache
Moderate access, ~200 cycles
Global/Local
Slowest access, 400+ cycles