siyuansong.bsky.social
3rd yr Linguistics undergrad @utaustin.bsky.social
Comp Psycholing & CogSci, human-like AI, rock🎸
Prev: VURI@Harvard Psych, Undergrad@SJTU
Looking for Ph.D position 26' Fall
12 posts
94 followers
253 following
Regular Contributor
Conversation Starter
comment in response to
post
Thanks! The main purpose behind our design was to minimize underestimation of the models' abilities (p.5+App. A). Our methods might have diminished the disadvantage of metalinguistic ability in base models. Moreover, we did observe instruct models are generally better than their base version.
comment in response to
post
Of course, introspection might emerge in even larger models or different tasks. Due to computational and methodological (we need to get prob distribution) constraints, we didn’t include closed-source models like Claude 3.7 or GPT 4.5, so that remains an open question for future work!
comment in response to
post
There is an interesting scaling trend — response to metalinguistic prompt from larger models aligns better with answers in probability measurement (Fig. 2b/6/9). But even when considering only large models ≥70B, we still didn’t find evidence of introspection.
comment in response to
post
My idea on this: Yes, we did observe signs of low consistency—alignment between the direct and meta methods was indeed low.
But our focus here is more on the *within-model vs. across-model* correlation, which I think is different from the consistency typically discussed.
comment in response to
post
Full Paper: arxiv.org/abs/2503.07513
Again, thanks to my amazing advisors @jennhu.bsky.social and @kmahowald.bsky.social for their guidance and support! (8/8)
comment in response to
post
Our findings offer a cautionary data point against recent results suggesting models can introspect.
There is also a takeaway for linguistics: meta-linguistic prompting does not necessarily tap into the linguistic generalizations reflected in an LLM’s internal model of language. (7/8)
comment in response to
post
However, the consistency between these measures is low (kappa ~ .25 for experiment 1). And within-model correlation is not really higher than across-model correlation when we consider relevantly similar models, like random seed variants (see plot below for our breakdown of “similar” models). (6/8)
comment in response to
post
We see that meta-linguistic prompting and direct measurement of probabilities both contain grammatical knowledge. Accuracy of both methods is high (and meta-linguistic accuracy is higher than direct for larger models). (5/8)
comment in response to
post
We test it in two linguistically informed domains: grammaticality judgments and word prediction. We set both up as forced choice and get (a) direct log probability measurements and (b) prompted knowledge. We compare (a) and (b) both within the same model and across models. (4/8)
comment in response to
post
We propose a new measure of introspection: the degree to which a model’s prompted responses predict its own string probabilities, beyond what would be predicted by another model with *nearly identical* internal knowledge. (3/8)
comment in response to
post
Why should we care if LLMs introspect?
Practical reasons: an LLM that can report its internal states would be safer + more reliable.
Scientific reasons: we shouldn’t use meta-linguistic prompts (eg acceptability judgments) w/ LLMs unless they can introspect about their linguistic knowledge! (2/8)