How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this: Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢 🧵⬇️ - ThreadSky

lauraruis.bsky.social • 98 days ago

How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this:

Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢

🧵⬇️

Comments

gulbaru.bsky.social•97 days ago

I am not really convinced, that there is actual reasoning to be found in LLMs. The free version of ChatGPT still fails miserably on simple riddles, like finding the number of Tom's sisters, if his sister Susan has two sisters. LLMs are good at looking as if they were reasoning, I'll give them that.

ccozianu.bsky.social•95 days ago

Do you happen to have an operational definition for "reasoning" , especially in the LLM context ?

woody1953.bsky.social•97 days ago

Remind me to never get into an argument with you! Ever!

lauraruis.bsky.social•97 days ago

hey btw don't get into an argument with me

woody1953.bsky.social•96 days ago

Never! Thanks!

vaishnavh.bsky.social•96 days ago

Was scrolling down this thread to learn more about the paper and I'm glad I did 😂😂😂

calligrapher.bsky.social•97 days ago

How does one distinguish between reason and following the rules of grammar?

lauraruis.bsky.social•97 days ago

The latter is an instance of the former

indigoapplepie.bsky.social•97 days ago

📌

youraibigsis.bsky.social•96 days ago

📌

kobsthegreat.bsky.social•97 days ago

Hold up, so we don’t actually know how LLMs reason ? 🤯
Or is the preprint suggesting reasoning is more nuanced than originally thought ?

howidiotami.bsky.social•97 days ago

LLMs dont reason.. they parrot...

heyimdandan.bsky.social•97 days ago

My experience of LLM’s seems to be the same as yours. They summarise existing information, but the idea of synthesis I have observed but when you compare and contrast ideas I would imagine the algorithm has enough points of reference to summarise so it seems like intelligence.

howidiotami.bsky.social•97 days ago

I examined AI theory in depth.. IMHO, as it is, LLM models and generally speaking AI, is just an enormopusly complex functional optimised w.r.t. given criteria with its previous training data: as such, dont expect any "reasoning"...

howidiotami.bsky.social•97 days ago

the missing points are "memory" and "emotions" i.e. temporal feedback from errors committed and what I would call "guts" feeling respectively.. by "guts" I mean the biological "sense of being" and the associated fear for its preservation.. as long as these factors are missing forget about GI...

nerd-on-fire.bsky.social•96 days ago

I didn't realize they were "reasoning" at all. I thought they were overhyped predictive word generators operating on probabilities and randomness.

nerd-on-fire.bsky.social•96 days ago

https://www.youtube.com/watch?v=6ROlMFlbkWE

emilevankrieken.com•97 days ago

This is a very interesting result! I like how your conclusion is carefully phrased not to say that this means the LLMs also perform these procedures :-)

lauraruis.bsky.social•98 days ago

Since LLMs entered the stage, there has been a hypothesis prevalent:

When LLMs are reasoning, they are doing some form of approximate retrieval where they “retrieve” the answer to intermediate reasoning steps from parametric knowledge, as opposed to doing “genuine” reasoning.

lauraruis.bsky.social•98 days ago

This is not unreasonable to think given the trillions of tokens LLMs are trained on, their high capacity for memorisation, the well-documented issues with data contamination of evaluation benchmarks, and the prompt-dependent nature of LLM reasoning.

lauraruis.bsky.social•98 days ago

However, most studies don’t look at the pretraining data when they conclude models aren’t genuinely reasoning. In this project we wondered: even if the answers to reasoning steps are in the data, is the model relying on them when producing reasoning traces?

lauraruis.bsky.social•98 days ago

We use influence functions to estimate the effect pretraining data have on the likelihood of completions of two LLMs (7B and 35B) for factual question answering (left), and reasoning traces for simple mathematical tasks (3 tasks, one shown right).

lauraruis.bsky.social•98 days ago

To my surprise, we find the opposite of what I thought when we started this project:

The approach to reasoning LLMs use looks unlike retrieval, and more like a generalisable strategy synthesising procedural knowledge from many documents doing a similar form of reasoning.

lauraruis.bsky.social•98 days ago

This is based on an analysis of the influence of 5M pretraining documents (covering 2.5B tokens) on factual questions, arithmetic, calculating slopes, and linear equations. All in all, we did **a billion** LLM-sized gradient dot products for this work 🧮

deepchatbot.bsky.social•97 days ago

Makes a lot of sense. Especially when working with them every day.

drsuzanne.bsky.social•98 days ago

They are using the same methods for synthesizing information as the data they are gathering is using?

giffmana.ai•98 days ago

Props to you for not driving your initial agenda, but instead following the evidence!

jnmulder.bsky.social•97 days ago

LLM’s can only “reason” from data they’ve seen before. Humans can reason their way thru situations they’ve never encountered before. Isn’t that proof LLM’s aren’t capable of genuine reasoning?

lauraruis.bsky.social•97 days ago

> LLMs can only "reason" from data they've seen before
do you pointers for evidence of this statement? Your proof relies on it and I don't think it's true

jnmulder.bsky.social•96 days ago

Would you like the AI answer?😛

hypr-sean.bsky.social•97 days ago

Very cool work, I’m glad some one is working on this. Did you find this to be architecture/embedding agnostic?

lauraruis.bsky.social•97 days ago

Thank you! We only tried one architecture unfortunately because it is a very compute-intense study. Would be exciting if future work tests other architectures!

gorkaurbizu.bsky.social•96 days ago

📌

stonewolfpc.bsky.social•97 days ago

Is it possible to teach an llm the whole of human knowledge and then teach it not to plagiarism or use other people's art? Your question is more important, but if someone gets time after answering yours... I'm interested.

hplisiecki.bsky.social•97 days ago

Extremely interesting
It's tough to navigate this discussion without predefining the terms tho. Here https://arxiv.org/abs/2403.17125 authors show that multiple shot prompting works not because you "teach llm to reason" but because you prime it with regards to data it already seen. Reasoning =\= Reasoning

lauraruis.bsky.social•97 days ago

thanks for sharing, added it to my reading list :) we look at zero-shot though!

hplisiecki.bsky.social•97 days ago

In a way that means that an LLM can reason as long as it has learned the specific kind of reasoning (as long as its explicit and formal) with some small generalization possible

dr-streich.bsky.social•96 days ago

Love the paper, just read the whole thing.

renee-betterson.bsky.social•97 days ago

Best thread 🥇

johnegan.bsky.social•94 days ago

hi laura,

am trying to develop options for probabilistic firewalls
Q: what is/are the best security measure(s) that you are aware of to help stop or mitigate probabilistic injection ?
the simplest form of probabilistic injection is a ‘prompt injection’

lauraruis.bsky.social•93 days ago

Hey John! This is not my area unfortunately, but a pretty easy way to get an answer to such questions is to ask ChatGPT with websearch on. This can give you some initial pointers of what might be the current SotA and from there you can do you own research into what kind of papers look into this!

johnegan.bsky.social•93 days ago

ask the probabilistic tech slop how to protect my clients from probabilistic tech slop 🤔

an iterative flip around and find out !😎

appreciate the chuckle

☮️ peace

https://medium.com/@john_94579/probability-injection-some-coin-flips-are-more-equal-than-others-a-feature-not-a-bug-dda9b77f2f54

lauraruis.bsky.social•92 days ago

didnt mean to ask chatgpt how to do it, but rather ask chatgpt for pointers to papers on the topic ;)

johnegan.bsky.social•92 days ago

there are papers that describe how probabilistic firewalls can be hacked into dust but that is it

wesleypasfield.bsky.social•87 days ago

Thanks for sharing this excellent work, and providing all the additional context in the thread!

thomvsa.bsky.social•97 days ago

📌

marshallgetto.bsky.social•97 days ago

I actually built and fed an AI myself while recovering from stoke. It was interesting

andreapanizza.bsky.social•88 days ago

@xeophon.bsky.social @shubhendu.bsky.social

shubhendu.bsky.social•88 days ago

Thanks! Saw that! Presented that paper just a couple of days ago, in fact!

lauraruis.bsky.social•87 days ago

What did you think

epicquotes.bsky.social•97 days ago

- If I have ever made any valuable discoveries, it has been due more to patient attention, than to any other talent - Isaac Newton
Join for more: http://epic-quotes.org

timkellogg.me•87 days ago

link: https://arxiv.org/abs/2411.12580

vickiboykis.com•87 days ago

📌

lazermonkmusic.bsky.social•97 days ago

❤️❤️❤️

hoffart.ai•98 days ago

This is interesting, thanks for sharing! Curious what you think of this idea: Artificially generate logical reasoning chains e.g. in first order logic, both in logic notation and natural language (e.g. all humans are mortal, Socrates is a human, hence Socrates is mortal), then pre-train on those?

burke.su•96 days ago

Chain of thought models already are known to solve problems better, but to go as far as trying to coerce an LLM into reasoning first order logic? That's something an expert system is better suited towards. No need to reinvent the wheel.

doublepluskombucha.com•98 days ago

📌

rockt.ai•96 days ago

gravity7.bsky.social•98 days ago

If I might ask you a question from intuition: If you were to want reasoning from a customized model for, say, business management, would you expect to find it in transcripts of business conversation, business methods documents, or perhaps lectures?

gravity7.bsky.social•98 days ago

And would you be inclined to prompt such an LLM for a written explanation of business reasons (for doing X or Y), or engage conversationally, and if so, would doing so as a business personality make a difference?

gravity7.bsky.social•98 days ago

Are you secure that the reasoning traces, and use of procedural reasoning methods comes from data/documents and not from reward/policy scaffolding (common ground, sycophancy effects)?

gravity7.bsky.social•98 days ago

Do you know of any research into differences between exposing reasoning in a written/document form vs reasoning evident in LLM-based conversation?

gravity7.bsky.social•98 days ago

This would be an interesting test of whether LLM-based conversation, as a type of "communication" with the user, suppresses the types of reasoning that might be more evident in written explanations.

teracomb.bsky.social•97 days ago

I would also be interested in getting answers to this

mullenba.bsky.social•97 days ago

There's no reasoning involved in LLMs. It's a statistical model that determines the most likely response given an input. It's why LLMs hallucinate so convincingly. Human brains are designed to detect patterns, LLMs are designed to create those patterns. No reasoning involved.

Comments

Posting Rules

Reply