See e.g. https://arxiv.org/abs/2410.18613 that recently showed that we can replace softmax attention with alternatives that do not satisfy the properties we intuitively assign to it, and yet these models seem to work just as well! (2/2)
Comments
Log in with your Bluesky account to leave a comment
Comments
PoM: Efficient Image and Video Generation with the Polynomial Mixer
http://arxiv.org/abs/2411.12663