Profile avatar
koustuvsinha.com
🔬Research Scientist, Meta AI (FAIR). 🎓PhD from McGill University + Mila 🙇‍♂️I study Multimodal LLMs, Vision-Language Alignment, LLM Interpretability & I’m passionate about ML Reproducibility (@reproml.org) 🌎https://koustuvsinha.com/
17 posts 293 followers 434 following
Regular Contributor
Conversation Starter
comment in response to post
Congrats, nice and refreshing papers, especially the word confusion idea! We need better similarity methods, good to see developments in this front! Curious if the confusion similarity depends on the label size of the classifier?
comment in response to post
Many many congratulations!! 🥳🎉🎉
comment in response to post
another factor which makes simple mlps work is visual token length. if you care about shorter tokens, you need a better mapper. these days most llms are capable of long context, which reduces the need of compressing visual tokens.
comment in response to post
one hypothesis why simple mappers work is 1. unfreezing the LLM provides enough parameters for mapping, 2. richer vision representations are closer to llm internal latent space arxiv.org/abs/2405.07987
comment in response to post
good questions! from what I see some folks still use complex mappers like Perceivers, but often simple mlp works good enough. the variable which induces the biggest improvement is almost always the alignment data.
comment in response to post
This is actually a cool result - token length being a rough heuristic for confidence of models?
comment in response to post
Lots of cool findings in our paper as well as in the website: tsb0601.github.io/metamorph/ Excited to see how the community "MetaMorph"'s existing LLMs!
comment in response to post
I wonder if veo-2 would be better at these prompts!
comment in response to post
Co-organized by @randomwalker.bsky.social @peterhenderson.bsky.social, @in4dmatics.bsky.social Naila Murray, @adinawilliams.bsky.social, Angela Fan, Mike Rabbat and Joelle Pineau. Checkout our website for CFP and more details: reproml.org
comment in response to post
Also, MLRC is now in 🦋 as well - do follow! :) @reproml.org
comment in response to post
Yes, that imo is one of the most exciting outcome for this direction - learning a new modality with much less compute. We have some really nice results, can’t wait to share it with everyone, stay tuned!
comment in response to post
👋 hello! :)
comment in response to post
Same here! Lets make a club! 😅