neelrajani.bsky.social
PhD student in Responsible NLP at the University of Edinburgh, passionate about MechInterp
21 posts
563 followers
467 following
Getting Started
Conversation Starter
comment in response to
post
I think as long as there are desirable job offers in academia and industry alike that hinge on X amount of papers published in "prestigious" venues, people continue to be incentivised to grind out more papers
comment in response to
post
Super cool, can't wait!
comment in response to
post
So jealous! Ever more reasons to apply to AllenAI... Can we get a sneak peek at what the tool is saying? 🙏
comment in response to
post
At least we can look at how often it occurs in OLMo's pre-training data, but what's a smart way to do so? Regex-ing the OLMo-mix for "protolithic" surely lands me in data jail...
comment in response to
post
Thank you, that's very kind! Credit to the ROME authors for how cool the plots look, I'm using their public GitHub code. Just posted some results comparing to the base model too :)
comment in response to
post
'late site' Attn results replicate somewhat, though this does not look as clean as their results on GPT-2-XL! There does seem to be non-negligible 'late site' MLP Indirect Effect for Llama 3.1 8B. I wonder how this affects their hypothesis? But keep in mind this is only for one Llama model! 3/3
comment in response to
post
is not in the model output, the prompt is skipped. In total, the default dataset from the ROME code contains 1209 prompts, so for the base model, only the result from ~15% of prompts make it to this graph, compared to ~71% for instruct. Again cool to see how Meng et al.'s 'early site' MLP vs. 2/3
comment in response to
post
I now also wish I knew about this much earlier! Ty for sharing
comment in response to
post
Awesome, thank you!! 🙏
comment in response to
post
Sounds good, looking forward!
comment in response to
post
Any chance I might be able to borrow it when you're done? :)
comment in response to
post
Hey Oliver, I'm a PhD student working on MechInterp. Was wondering if I could perhaps be added to the starter pack too? :)
comment in response to
post
Hey Julian! I'm a PhD student working on interpretability at the University of Edinburgh, was wondering if I could kindly ask to be added as well? 🙌
comment in response to
post
From a technical standpoint this is clearly impressive, but it has a really eery quality to it. And the fact that it 'sang' the "(Fade out with improvised soul scatting)" instruction in the outro was a funny touch 😅
comment in response to
post
Thank you :)
comment in response to
post
Hey @ramandutt4.bsky.social, any chance I could kindly ask you to add me too? 🙏
comment in response to
post
2/2 My hacky attempt at changing their codebase to accept Llama3.1 8B Instruct. Pretty cool that the 'early-site/late-site' findings replicate somewhat even on a single sample. Very curious for my sweep of the full 1209 samples from their paper to finish for more representative results :D
comment in response to
post
🙋♂️