(3) To investigate this, we tested VLMs on classic visual search tasks. They excel at finding unique objects (e.g., one green shape among red shapes πŸ”΄πŸŸ’πŸ”΄πŸ”΄). But searching for specific feature combinations? Performance drops substantially - similar to people when under time pressure.
Post image

Comments