(1) Vision language models can explain complex charts & decode memes, but struggle with simple tasks young kids find easy - like counting objects or finding items in cluttered scenes! Our ππ #NeurIPS2024 paper shows why: they face the same 'binding problem' that constrains human vision! π§΅π
Comments