LLMs can do this to some degree but struggle more than humans, unless they are given many in-context examples (and even then sometimes), and definitely have problems executing algorithms given in prompts. - ThreadSky

About ThreadSky

melaniemitchell.bsky.social • 5 days ago

LLMs can do this to some degree but struggle more than humans, unless they are given many in-context examples (and even then sometimes), and definitely have problems executing algorithms given in prompts.

Comments

davidduvenaud.bsky.social•4 days ago

Thanks for explaining, but I'm still confused. LLMs succeed regularly at following complex natural-language instructions without examples - it's their bread and butter. I agree they sometimes have problems executing algorithms consistently (unless fine-tuned to do so), but so do untrained humans.

melaniemitchell.bsky.social•4 days ago

Many studies have found that if the instructions are for a task that is sufficiently different from the training data, they have trouble following the instructions in a way that seems (or is) unhumanlike.

E.g.,

(more in next reply)

melaniemitchell.bsky.social•4 days ago

https://aclanthology.org/2024.naacl-long.102/
https://openreview.net/pdf?id=t5cy5v9wph
https://arxiv.org/abs/2309.13638
https://machinelearning.apple.com/research/gsm-symbolic

(more in next reply)

melaniemitchell.bsky.social•4 days ago

There are newer models that do better on some of the examples in these papers, but it's not clear that they are doing it in a general way -- these kinds of robustness studies on newer (e.g., "reasoning" models) remain to be done.

(more in next reply)

melaniemitchell.bsky.social•4 days ago

These models are still not at human-level at dealing with novelty (even when explained in the instructions.)

melaniemitchell.bsky.social•5 days ago

But you make a fair point, and we will clarify what we mean in an updated version of the paper.

Posting Rules

Be respectful to others
No spam or self-promotion
Stay on topic
Follow Bluesky's terms of service

Comments

Posting Rules

Reply