Graphcore basically slapped a bit of SRAM onto every core, *inside* the processor tile. GPU memory (VRAM) is super far away by comparison. The ~2,400 cores on the Graphcore C2 could all run independent tasks without waiting in batches (aka, MIMD instead of SIMT).
See Tweet