See e.g. arxiv.org/abs/2410.18613 that recently showed that we can replace softmax attention with alternatives that do not satisfy the properties we intuitively assign to it, and yet these models seem to work just as well! (2/2) - ThreadSky

damienteney.bsky.social • 82 days ago

See e.g. https://arxiv.org/abs/2410.18613 that recently showed that we can replace softmax attention with alternatives that do not satisfy the properties we intuitively assign to it, and yet these models seem to work just as well! (2/2)

Comments

Posting Rules

Comments

Posting Rules

Reply