nsaphra.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

lol it gets better, thanks for the laugh

submitted 1 day ago

comment in response to post

I recently looked at European programs for poaching US scientists. The bait funding for “defectors” was under 1/3 of my startup package in a *terrible* year for the US faculty market. If European countries are going to expand their science community, they need to step up funding.

submitted 1 day ago

comment in response to post

That was, of course, the summer Vienna Teng composed the Hymn of Acxiom, an incredible eerie piece sung from the perspective of an AI system for personalized advertising.

submitted 3 days ago

comment in response to post

See x.com/colin_fraser...

submitted 3 days ago

comment in response to post

I was really upset that I couldn’t make it to the Naomis-only mass book club meetup in Central Park for discussing Naomi Klein’s Doppelgänger.

submitted 5 days ago

comment in response to post

If this flag offends you I’ll help you pack 😏

submitted 5 days ago

comment in response to post

I’ll wait until they provide non-“preliminary” results. Like I want to see the actual corrected experiment results, not “we found some individual examples of understated performance.”

submitted 6 days ago

comment in response to post

what a cool department and getting cooler

submitted 6 days ago

comment in response to post

Thanks to @ml-collective.bsky.social for enabling yet another random international collaboration. See you in Vienna!

submitted 7 days ago

comment in response to post

Why look at multiple domains? Because we want interpretability researchers to think about the latent structure underlying your data! Even if it means that you need to start working with domain experts in interdisciplinary ways.

submitted 7 days ago

comment in response to post

Not only do vowels interact more with surrounding acoustic features, but so do the most vowel-like consonants (bottom line in the consonant chart). In voiced/unvoiced pairs like b/p, the highly turbulent voiced phonemes are also more contextual than their unvoiced counterparts!

submitted 7 days ago

comment in response to post

Next we jump from analyzing text models to predictive speech models! Phoneticists have claimed for decades that humans rely more on contextual cues when processing vowels compared to consonants. Turns out so do speech models!

submitted 7 days ago

comment in response to post

Syntax isn't the only structure we find! In idiomatic multiword expressions like "kick the bucket", the meaning of "kick" is more dependent on "bucket" than in "kick the ball". Accordingly, idiom tokens interact more than non-idiomatic tokens!

submitted 7 days ago

comment in response to post

In autoregressive models specifically, we see that nonlinear interaction is correlated consistently with syntactic relatedness between tokens (for a given token distance).

submitted 7 days ago

comment in response to post

First, we show that in both autoaggressive and masked language models, contextual token pairs interact less as we increase the distance between the paired tokens (d_i) and as we increase the distance from the predicted target token (d_p).

submitted 7 days ago

comment in response to post

I understand that, not talking about multi epoch, just high-frequency examples (compared to humans exposed to the same examples at similar interspersed frequency of exposure)

submitted 7 days ago

comment in response to post

Humans rarely memorize after one or two exposures to a novel sequence unless they recite or mentally replay the text, artificially increasing the number of exposures (which LMs cannot do during supervised training). You’re describing something that humans also do not do.

submitted 7 days ago

comment in response to post

Yeah I understand for ICL. I’m asking for evidence of the claim about weight learning, late in training specifically (since early in training they’re still learning pretty simple co-occurrence structures)

submitted 7 days ago

comment in response to post

I want to see work comparing them to humans, exposed at similar intervals to the same examples, late in training / during finetuning. If you ask humans to memorize something specific, they are literally replaying it mentally to allow that memorization. Humans rarely memorize after brief exposures.

submitted 7 days ago

comment in response to post

Ok can you link a result comparing human and LM performance at memorization and retrieval? I literally have no idea what situation you’re describing even for training examples.

submitted 7 days ago

comment in response to post

Sorry, can you link a paper so I understand what situations you are contrasting?

submitted 7 days ago

comment in response to post

All this leads me to: it doesn’t matter how good LLMs are at the “information retrieval” version of reasoning tasks like (eg, explaining algorithms). No variable binding, no grounded reasoning—and rather than helping, raw scale seems to hurt grounding. We can’t be surprised by results like these.

submitted 7 days ago

comment in response to post

Finally, after induction heads form and ICL becomes possible, models get progressively less human-like in their predictions, leading to major divergence in larger scale models. You cannot evaluate these models like humans, because they are not remotely humanlike; they are far better at retrieval.

submitted 7 days ago

comment in response to post

In actual language models, models move from solving ICL examples with induction heads, which enable algorithmic solutions that are grounded by context, to function vector heads, which retrieve specific known tasks.

submitted 7 days ago

comment in response to post

In controlled settings, models switch from grounded in-context learning to generic information retrieval during training.

submitted 7 days ago

comment in response to post

“Recognizing that a problem is literally describing Hanoi” doesn’t require reasoning even in humans. There are only so many ways to describe that exact task beyond swapping out the object types.

submitted 7 days ago

comment in response to post

Wow that’s a terrible article framing. A model loses the thread of a long task because our current models are bad at generalized algorithm execution which should be easy for a computer! It doesn’t “get bored”.

submitted 7 days ago

comment in response to post

There is no novel way to ask about Towers of Hanoi that isn’t pretty recognizable.

submitted 7 days ago

comment in response to post

If you pick an example that models fail at because they cannot execute an algorithm generally, while human children fail at it because they get bored, you have not shown the model is actually reasoning as well as a human.

submitted 7 days ago

comment in response to post

What is hard for models and hard for humans are different things. It is extremely easy for a model to explain by information retrieval and light paraphrasing. Memorizing explanations is only hard for humans, easy for LLMs. For a model, it does not require even an ounce of reasoning.

submitted 7 days ago