mtutek.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

I'm hiring at least one post-doc! We're interested in creating language models that process language more like humans than mainstream LLMs do, through architectural modifications and interpretability-style steering. Express interest here: docs.google.com/forms/d/e/1F...

submitted 20 hours ago • 2 comments

Hi #NLP community, I'm urgently looking for an emergency reviewer for the ARR Linguistic Theories track. The paper investigates and measures orthography across many languages. Please shoot me a quick email if you can review!

submitted 1 day ago • 0 comments

Very useful slack workspace for quickly clarifying any issues wrt. OpenReview interface or the review period in general.

submitted 17 days ago • 0 comments

If you're finishing your camera-ready for ACL or ICML and want to cite co-first authors more fairly, I just made a simple fix to do this! Just add $^*$ to the authors' names in your bibtex, and the citations should change :) github.com/tpimentelms/...

submitted 24 days ago • 4 comments

Tried steering with SAEs and found that not all features behave as expected? Check out our new preprint - "SAEs Are Good for Steering - If You Select the Right Features" 🧵

submitted 25 days ago • 2 comments

Really cool work introducing a gradient-free method for unlearning organically memorized sensitive information from LMs! (we also curate two datasets of organically memorized sensitive information) Check out the 🧵 below and come talk to us at @aclmeeting.bsky.social in Vienna 🍻

submitted 26 days ago • 0 comments

🚨New paper at #ACL2025 Findings! REVS: Unlearning Sensitive Information in LMs via Rank Editing in the Vocabulary Space. LMs memorize and leak sensitive data—emails, SSNs, URLs from their training. We propose a surgical method to unlearn it. 🧵👇w/ @boknilev.bsky.social @mtutek.bsky.social 1/8

submitted 26 days ago • 1 comment

Our paper "Position-Aware Automatic Circuit Discovery" got accepted to ACL! 🎉 Huge thanks to my collaborators🙏 @hadasorgad.bsky.social @davidbau.bsky.social @amuuueller.bsky.social @boknilev.bsky.social See you in Vienna! 🇦🇹 #ACL2025 @aclmeeting.bsky.social

submitted 31 days ago • 1 comment

🚨🚨 Studying the INTERPLAY of LMs' internals and behavior? Join our @colmweb.org workshop on comprehensivly evaluating LMs. Deadline: June 23rd CfP: shorturl.at/sBomu Page: shorturl.at/FT3fX We're excited to see your insights and methods!! See you in Montréal 🇨🇦 #nlproc #interpretability

submitted 37 days ago • 0 comments

BlackBoxNLP has a shared task this year based on MIB - if you have circuit or causal variable localization methods you want to showcase, this is the place to show how good they are!

submitted 38 days ago • 0 comments

⚠️ Important for #NeurIPS2025 authors: remember that second-order optimization papers CANNOT be submitted under the CC BY-NC-ND 4.0 license. It requires "no derivatives."

submitted 41 days ago • 2 comments

📢 Planning to submit to EMNLP 2025? Make sure you're up to speed with new ARR policies: 👉 aclrollingreview.org/incentives2025 #NLProc #EMNLP #ARR #ACL (1/2)

submitted 48 days ago • 2 comments

Slides available here: docs.google.com/presentation...

submitted 48 days ago • 1 comment

If you're at NAACL, come check out what Ana will be talking about!

submitted 50 days ago • 0 comments

Very interesting oral history -- interviews with some top NLP folks on the effects of GenAI on their field: www.quantamagazine.org/when-chatgpt...

submitted 52 days ago • 0 comments

Work in progress -- suggestions for NLP-ers based in the EU/Europe & already on Bluesky very welcome! go.bsky.app/NZDc31B

submitted 223 days ago • 48 comments

Very happy to have been a part of this effort to standardize eval in mechanistic interpretability! Plenty of resources in the thread & a lot of collaborators at ICLR/NAACL (they won't be wearing black, though)

submitted 59 days ago • 0 comments

🎉 Our Actionable Interpretability workshop has been accepted to #ICML2025! 🎉 > Follow @actinterp.bsky.social > Website actionable-interpretability.github.io @talhaklay.bsky.social @anja.re @mariusmosbach.bsky.social @sarah-nlp.bsky.social @iftenney.bsky.social Paper submission deadline: May 9th!

submitted 82 days ago • 3 comments

Want to know what training data has been memorized by models like GPT-4? We propose information-guided probes, a method to uncover memorization evidence in completely black-box models, without requiring access to 🙅‍♀️ Model weights 🙅‍♀️ Training data 🙅‍♀️ Token probabilities 🧵 (1/5)

submitted 92 days ago • 4 comments

Just a reminder that COLM's abstract registration window is still open, and you can submit your abstract until March 22 AoE Full paper deadline was also extended to March 28

submitted 92 days ago • 0 comments

A bit of a mess around the conflict of COLM with the ARR (and to lesser degree ICML) reviews release. We feel this is creating a lot of pressure and uncertainty. So, we are pushing our deadlines: Abstracts due March 22 AoE (+48hr) Full papers due March 28 AoE (+24hr) Plz RT 🙏

submitted 93 days ago • 3 comments

🚨🚨 New preprint 🚨🚨 Ever wonder whether verbalized CoTs correspond to the internal reasoning process of the model? We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from the model parameters to assess CoT faithfulness. arxiv.org/abs/2502.14829

submitted 120 days ago • 2 comments

🚨🚨 New preprint 🚨🚨 Ever wonder whether verbalized CoTs correspond to the internal reasoning process of the model? We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from the model parameters to assess CoT faithfulness. arxiv.org/abs/2502.14829

submitted 120 days ago • 2 comments

Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovi\'c, Yonatan Belinkov Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps https://arxiv.org/abs/2502.14829

submitted 121 days ago • 0 comments

🚨New arXiv preprint!🚨 LLMs can hallucinate - but did you know they can do so with high certainty even when they know the correct answer? 🤯 We find those hallucinations in our latest work with @itay-itzhak.bsky.social, @fbarez.bsky.social, @gabistanovsky.bsky.social and Yonatan Belinkov

submitted 122 days ago • 3 comments

Our workshop on LLM Memorization is coming to ACL 2025! The call for papers is out, please submit both archival and non-archival (work in progress or already published) papers

submitted 145 days ago • 0 comments