Very nice analysis of the important role that visual perception plays in ARC problems. The ability of LLMs to solve these problems is dramatically affected by their size, even for identical problems. https://anokas.substack.com/p/llms-struggle-with-perception-not-reasoning-arcagi
Comments
I think visual reasoning performance should be pretty good once the visual parts catch up with the reasoning parts and complex images are properly tokenized with 2D positional data.
GPT-4o can't even do that itself for an example grid in the format they used to input the data.