Google's Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time as presented by one of the author - @alibehrouz.bsky.social
Comments
Log in with your Bluesky account to leave a comment
Titans are more effective than Transformers and modern linear RNNs, and can effectively scale to larger than 2M context window, with better performance than ultra-large models (e.g., GPT4, Llama3-80B).
thatβs very interesting. i need to read for details. they refer to TTT, which iβm skeptical of bc you donβt have labels at test time, but maybe this is different
iβm absolutely *fascinated* by their allusion to smart forgetting
Comments
Post on X: https://x.com/behrouz_ali/status/1878859086227255347
iβm absolutely *fascinated* by their allusion to smart forgetting