(1) Vision language models can explain complex charts & decode memes, but struggle with simple tasks young kids find easy - like counting objects or finding items in cluttered scenes! Our πŸ†’πŸ†• #NeurIPS2024 paper shows why: they face the same 'binding problem' that constrains human vision! πŸ§΅πŸ‘‡
Post image

Comments