ML/Linguistics question: Did they ever find out for sure why reversing the input in an LSTM encoder-decoder improved results? I've seen claims about short-term dependencies (which is presumably language relative) and making it harder for decoders, but can't find anything concrete. - ThreadSky

About ThreadSky

fintanmallory.com • 144 days ago

ML/Linguistics question: Did they ever find out for sure why reversing the input in an LSTM encoder-decoder improved results? I've seen claims about short-term dependencies (which is presumably language relative) and making it harder for decoders, but can't find anything concrete.

Comments

segyges.bsky.social•142 days ago

relevant recent research: https://arxiv.org/abs/2403.13799

fintanmallory.com•142 days ago

Thanks for this! I hadn't seen this paper and the fact that the abstract connects the issue to Zipfian distributions in the datasets is very intriguing

segyges.bsky.social•142 days ago

imho we should be using jittered causal mask to get the same effect but i am not going to get around to experimenting with that

ai-notes.bsky.social•144 days ago

The belief was that this made it easier to learn to translate the first word, which then made it easier to learn to translate the second, etc. I don't know if they ran careful experiments to show this was the mechanism.

fintanmallory.com•144 days ago

Yeah. I share the intuition. It would be nice to know if we still get the same effect when translating e.g. Japanese (SOV) and Malagasy (VOS) or on synthetic data. It seems like a testable intuition.

ai-notes.bsky.social•144 days ago

Oh, I see what you're saying! That is interesting, and I don't know of any studies.

Posting Rules

Be respectful to others
No spam or self-promotion
Stay on topic
Follow Bluesky's terms of service

Comments

Posting Rules

Reply