(3) To investigate this, we tested VLMs on classic visual search tasks. They excel at finding unique objects (e.g., one green shape among red shapes π΄π’π΄π΄). But searching for specific feature combinations? Performance drops substantially - similar to people when under time pressure.
Comments