New preprint! Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs.
In this paper with @kesnet50.bsky.social and my advisor Armando Solar-Lezama, we investigate how LLMs perform on randomly selected simple language reasoning problems.
https://arxiv.org/abs/2501.02825
In this paper with @kesnet50.bsky.social and my advisor Armando Solar-Lezama, we investigate how LLMs perform on randomly selected simple language reasoning problems.
https://arxiv.org/abs/2501.02825
Comments
I'm also curious why LLMs don't already have this ability. If we trained them on related data, would they gain this skill (generalizably?)
Would love to see the test results for the LLMs specifically marketed as reasoning-focused (Gemini 2.0 Flash Thinking, OpenAI o1 (pro), DeepSeek Thinking) though.