(3) To investigate this, we tested VLMs on classic visual search tasks. They excel at finding unique objects (e.g., one green shape among red shapes 🔴🟢🔴🔴). But searching for specific feature combinations? Performance drops substantially - similar to people when under time pressure. - ThreadSky

About ThreadSky

(3) To investigate this, we tested VLMs on classic visual search tasks. They excel at finding unique objects (e.g., one green shape among red shapes 🔴🟢🔴🔴). But searching for specific feature combinations? Performance drops substantially - similar to people when under time pressure.

Comments

thisisadax.bsky.social•138 days ago

(4) Both multimodal LMs & text-to-image models show strict capacity limits - similar to human 'subitizing' limits during rapid parallel processing. Key finding: They improve with visually distinct objects, suggesting failures stem from feature interference.

thisisadax.bsky.social•138 days ago

(5) We developed a scene description benchmark inspired by visual working memory tasks to more directly evaluate how feature overlap affects performance. Key finding: Errors spike when objects share overlapping features - driven by 'illusory conjunctions' where features get mixed up!

thisisadax.bsky.social•138 days ago

(6) Finally, we found that breaking 🪚🔨 visual analogy tasks into smaller chunks (i.e. performing object segmentation) to mitigate the influence of feature interference improves performance on those tasks.

thisisadax.bsky.social•138 days ago

(7) The punchline? Capacity limits aren't just about the number of objects - they stem from interference between representations when processing multiple things at once. This 'binding problem' creates fundamental constraints on parallel processing in both humans and VLMs🧍‍♂️🤖 .

thisisadax.bsky.social•138 days ago

(8) This work wouldn't have been possible without my amazing collaborators Sunayana Rane, Tyler Giallanza, Nicolò De Sabbata, Kia Ghods, Amogh Joshi, Alexander Ku, @frankland.bsky.social, @cocoscilab.bsky.social, Jonathan Cohen, and @taylorwwebb.bsky.social.

neurostats.org•120 days ago

📌

Posting Rules

Be respectful to others
No spam or self-promotion
Stay on topic
Follow Bluesky's terms of service

Comments

Posting Rules

Reply