"We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence" - Noam Shazeer (second author of the transformer paper, now CEO of Character AI) from the SwiGLU paper: https://arxiv.org/abs/2002.05202v1
See Tweet