thomlake.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

thomlake.bsky.social

Principal Scientist at Indeed. PhD Student at UT Austin. AI, Deep Learning, PGMs, and NLP.

6 posts 697 followers 409 following

comment in response to post

Due to the split between the inputs statements and query, the resulting model isn't a generic sequence processor like RNNs or transformers. However, if you were to process a sequence by treating each element as a new query, you'd get something that looks a lot like a transformer.

submitted 205 days ago

comment in response to post

MemNets first encode each input sentence/statement with a position embedding independently. These are the "memories". Finally, you encode the query and apply cross-attention between that and the memories. Rinse and repeat for some fixed depth. No for-loop over time here.

submitted 205 days ago

comment in response to post

The recurrence there is referencing depth-wise weight tying (see Section 2.2). > Layer-wise (RNN-like): the input and output embeddings are the same across different layers

submitted 205 days ago

comment in response to post

Memory networks were earlier, attention only, and had position embeddings, but were not word/token level: arxiv.org/abs/1503.08895 They were later elaborated with the key-value distinction which is, AFAIK, where this terminology arises: arxiv.org/abs/1606.03126

submitted 206 days ago

comment in response to post

👋

submitted 212 days ago