nsaphra.bsky.social
Waiting on a robot body. All opinions are universal and held by both employers and family.
Recruiting students to start my lab!
ML/NLP/they/she.
1,992 posts
9,307 followers
1,388 following
Regular Contributor
Active Commenter
comment in response to
post
lol it gets better, thanks for the laugh
comment in response to
post
I recently looked at European programs for poaching US scientists. The bait funding for “defectors” was under 1/3 of my startup package in a *terrible* year for the US faculty market. If European countries are going to expand their science community, they need to step up funding.
comment in response to
post
That was, of course, the summer Vienna Teng composed the Hymn of Acxiom, an incredible eerie piece sung from the perspective of an AI system for personalized advertising.
comment in response to
post
See x.com/colin_fraser...
comment in response to
post
I was really upset that I couldn’t make it to the Naomis-only mass book club meetup in Central Park for discussing Naomi Klein’s Doppelgänger.
comment in response to
post
If this flag offends you I’ll help you pack 😏
comment in response to
post
I’ll wait until they provide non-“preliminary” results. Like I want to see the actual corrected experiment results, not “we found some individual examples of understated performance.”
comment in response to
post
what a cool department and getting cooler
comment in response to
post
Thanks to @ml-collective.bsky.social for enabling yet another random international collaboration. See you in Vienna!
comment in response to
post
Why look at multiple domains? Because we want interpretability researchers to think about the latent structure underlying your data! Even if it means that you need to start working with domain experts in interdisciplinary ways.
comment in response to
post
Not only do vowels interact more with surrounding acoustic features, but so do the most vowel-like consonants (bottom line in the consonant chart). In voiced/unvoiced pairs like b/p, the highly turbulent voiced phonemes are also more contextual than their unvoiced counterparts!
comment in response to
post
Next we jump from analyzing text models to predictive speech models! Phoneticists have claimed for decades that humans rely more on contextual cues when processing vowels compared to consonants. Turns out so do speech models!
comment in response to
post
Syntax isn't the only structure we find! In idiomatic multiword expressions like "kick the bucket", the meaning of "kick" is more dependent on "bucket" than in "kick the ball". Accordingly, idiom tokens interact more than non-idiomatic tokens!
comment in response to
post
In autoregressive models specifically, we see that nonlinear interaction is correlated consistently with syntactic relatedness between tokens (for a given token distance).
comment in response to
post
First, we show that in both autoaggressive and masked language models, contextual token pairs interact less as we increase the distance between the paired tokens (d_i) and as we increase the distance from the predicted target token (d_p).
comment in response to
post
I understand that, not talking about multi epoch, just high-frequency examples (compared to humans exposed to the same examples at similar interspersed frequency of exposure)
comment in response to
post
Humans rarely memorize after one or two exposures to a novel sequence unless they recite or mentally replay the text, artificially increasing the number of exposures (which LMs cannot do during supervised training). You’re describing something that humans also do not do.
comment in response to
post
Yeah I understand for ICL. I’m asking for evidence of the claim about weight learning, late in training specifically (since early in training they’re still learning pretty simple co-occurrence structures)
comment in response to
post
I want to see work comparing them to humans, exposed at similar intervals to the same examples, late in training / during finetuning. If you ask humans to memorize something specific, they are literally replaying it mentally to allow that memorization. Humans rarely memorize after brief exposures.
comment in response to
post
Ok can you link a result comparing human and LM performance at memorization and retrieval? I literally have no idea what situation you’re describing even for training examples.
comment in response to
post
Sorry, can you link a paper so I understand what situations you are contrasting?
comment in response to
post
All this leads me to: it doesn’t matter how good LLMs are at the “information retrieval” version of reasoning tasks like (eg, explaining algorithms). No variable binding, no grounded reasoning—and rather than helping, raw scale seems to hurt grounding. We can’t be surprised by results like these.
comment in response to
post
Finally, after induction heads form and ICL becomes possible, models get progressively less human-like in their predictions, leading to major divergence in larger scale models. You cannot evaluate these models like humans, because they are not remotely humanlike; they are far better at retrieval.
comment in response to
post
In actual language models, models move from solving ICL examples with induction heads, which enable algorithmic solutions that are grounded by context, to function vector heads, which retrieve specific known tasks.
comment in response to
post
In controlled settings, models switch from grounded in-context learning to generic information retrieval during training.
comment in response to
post
“Recognizing that a problem is literally describing Hanoi” doesn’t require reasoning even in humans. There are only so many ways to describe that exact task beyond swapping out the object types.
comment in response to
post
Wow that’s a terrible article framing. A model loses the thread of a long task because our current models are bad at generalized algorithm execution which should be easy for a computer! It doesn’t “get bored”.
comment in response to
post
There is no novel way to ask about Towers of Hanoi that isn’t pretty recognizable.
comment in response to
post
If you pick an example that models fail at because they cannot execute an algorithm generally, while human children fail at it because they get bored, you have not shown the model is actually reasoning as well as a human.
comment in response to
post
What is hard for models and hard for humans are different things. It is extremely easy for a model to explain by information retrieval and light paraphrasing. Memorizing explanations is only hard for humans, easy for LLMs. For a model, it does not require even an ounce of reasoning.