There are now several benchmarks testing spatial reasoning and agent capabilities of LLMs and VLMs:
- https://arxiv.org/abs/2410.06468 (does spatial cognition ...)
- https://arxiv.org/abs/2307.06281 (MMBench)
- https://arxiv.org/abs/2411.13543 (BALROG) - additional points for the LOTR ref.
- https://arxiv.org/abs/2410.06468 (does spatial cognition ...)
- https://arxiv.org/abs/2307.06281 (MMBench)
- https://arxiv.org/abs/2411.13543 (BALROG) - additional points for the LOTR ref.
Comments
https://arxiv.org/abs/2410.07765