The next chapter about transformers is up on YouTube, digging into the attention mechanism: https://youtu.be/eMlx5fFNoYc The model works with vectors representing tokens (think words), and this is the mechanism that allows those vectors to take in meaning from context.
See Tweet