LLMs can do this to some degree but struggle more than humans, unless they are given many in-context examples (and even then sometimes), and definitely have problems executing algorithms given in prompts.
Comments
Log in with your Bluesky account to leave a comment
Thanks for explaining, but I'm still confused. LLMs succeed regularly at following complex natural-language instructions without examples - it's their bread and butter. I agree they sometimes have problems executing algorithms consistently (unless fine-tuned to do so), but so do untrained humans.
Many studies have found that if the instructions are for a task that is sufficiently different from the training data, they have trouble following the instructions in a way that seems (or is) unhumanlike.
There are newer models that do better on some of the examples in these papers, but it's not clear that they are doing it in a general way -- these kinds of robustness studies on newer (e.g., "reasoning" models) remain to be done.
Comments
E.g.,
(more in next reply)
https://openreview.net/pdf?id=t5cy5v9wph
https://arxiv.org/abs/2309.13638
https://machinelearning.apple.com/research/gsm-symbolic
(more in next reply)
(more in next reply)