Profile avatar
mtutek.bsky.social
Postdoc @ TakeLab, UniZG | previously: Technion; TU Darmstadt | PhD @ TakeLab, UniZG Faithful explainability, controllability & safety of LLMs. πŸ”Ž On the academic job market πŸ”Ž https://mttk.github.io/
37 posts 225 followers 318 following
Regular Contributor
Active Commenter

I'm hiring at least one post-doc! We're interested in creating language models that process language more like humans than mainstream LLMs do, through architectural modifications and interpretability-style steering. Express interest here: docs.google.com/forms/d/e/1F...

Hi #NLP community, I'm urgently looking for an emergency reviewer for the ARR Linguistic Theories track. The paper investigates and measures orthography across many languages. Please shoot me a quick email if you can review!

Very useful slack workspace for quickly clarifying any issues wrt. OpenReview interface or the review period in general.

If you're finishing your camera-ready for ACL or ICML and want to cite co-first authors more fairly, I just made a simple fix to do this! Just add $^*$ to the authors' names in your bibtex, and the citations should change :) github.com/tpimentelms/...

Tried steering with SAEs and found that not all features behave as expected? Check out our new preprint - "SAEs Are Good for Steering - If You Select the Right Features" 🧡

Really cool work introducing a gradient-free method for unlearning organically memorized sensitive information from LMs! (we also curate two datasets of organically memorized sensitive information) Check out the 🧡 below and come talk to us at @aclmeeting.bsky.social in Vienna 🍻

🚨New paper at #ACL2025 Findings! REVS: Unlearning Sensitive Information in LMs via Rank Editing in the Vocabulary Space. LMs memorize and leak sensitive dataβ€”emails, SSNs, URLs from their training. We propose a surgical method to unlearn it. πŸ§΅πŸ‘‡w/ @boknilev.bsky.social @mtutek.bsky.social 1/8

Our paper "Position-Aware Automatic Circuit Discovery" got accepted to ACL! πŸŽ‰ Huge thanks to my collaboratorsπŸ™ @hadasorgad.bsky.social @davidbau.bsky.social @amuuueller.bsky.social @boknilev.bsky.social See you in Vienna! πŸ‡¦πŸ‡Ή #ACL2025 @aclmeeting.bsky.social

🚨🚨 Studying the INTERPLAY of LMs' internals and behavior? Join our @colmweb.org workshop on comprehensivly evaluating LMs. Deadline: June 23rd CfP: shorturl.at/sBomu Page: shorturl.at/FT3fX We're excited to see your insights and methods!! See you in MontrΓ©al πŸ‡¨πŸ‡¦ #nlproc #interpretability

BlackBoxNLP has a shared task this year based on MIB - if you have circuit or causal variable localization methods you want to showcase, this is the place to show how good they are!

⚠️ Important for #NeurIPS2025 authors: remember that second-order optimization papers CANNOT be submitted under the CC BY-NC-ND 4.0 license. It requires "no derivatives."

πŸ“’ Planning to submit to EMNLP 2025? Make sure you're up to speed with new ARR policies: πŸ‘‰ aclrollingreview.org/incentives2025 #NLProc #EMNLP #ARR #ACL (1/2)

Slides available here: docs.google.com/presentation...

If you're at NAACL, come check out what Ana will be talking about!

Very interesting oral history -- interviews with some top NLP folks on the effects of GenAI on their field: www.quantamagazine.org/when-chatgpt...

Work in progress -- suggestions for NLP-ers based in the EU/Europe & already on Bluesky very welcome! go.bsky.app/NZDc31B

Very happy to have been a part of this effort to standardize eval in mechanistic interpretability! Plenty of resources in the thread & a lot of collaborators at ICLR/NAACL (they won't be wearing black, though)

πŸŽ‰ Our Actionable Interpretability workshop has been accepted to #ICML2025! πŸŽ‰ > Follow @actinterp.bsky.social > Website actionable-interpretability.github.io @talhaklay.bsky.social @anja.re @mariusmosbach.bsky.social @sarah-nlp.bsky.social @iftenney.bsky.social Paper submission deadline: May 9th!

Want to know what training data has been memorized by models like GPT-4? We propose information-guided probes, a method to uncover memorization evidence in *completely black-box* models, without requiring access to πŸ™…β€β™€οΈ Model weights πŸ™…β€β™€οΈ Training data πŸ™…β€β™€οΈ Token probabilities 🧡 (1/5)

Just a reminder that COLM's abstract registration window is still open, and you can submit your abstract until **March 22 AoE** Full paper deadline was also extended to March 28

A bit of a mess around the conflict of COLM with the ARR (and to lesser degree ICML) reviews release. We feel this is creating a lot of pressure and uncertainty. So, we are pushing our deadlines: Abstracts due March 22 AoE (+48hr) Full papers due March 28 AoE (+24hr) Plz RT πŸ™

🚨🚨 New preprint 🚨🚨 Ever wonder whether verbalized CoTs correspond to the internal reasoning process of the model? We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from the model parameters to assess CoT faithfulness. arxiv.org/abs/2502.14829

🚨🚨 New preprint 🚨🚨 Ever wonder whether verbalized CoTs correspond to the internal reasoning process of the model? We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from the model parameters to assess CoT faithfulness. arxiv.org/abs/2502.14829

Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovi\'c, Yonatan Belinkov Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps https://arxiv.org/abs/2502.14829

🚨New arXiv preprint!🚨 LLMs can hallucinate - but did you know they can do so with high certainty even when they know the correct answer? 🀯 We find those hallucinations in our latest work with @itay-itzhak.bsky.social, @fbarez.bsky.social, @gabistanovsky.bsky.social and Yonatan Belinkov

Our workshop on LLM Memorization is coming to ACL 2025! The call for papers is out, please submit both archival and non-archival (work in progress or already published) papers