What's in an attention head? 🤯 We present an efficient framework – MAPS – for inferring the functionality of attention heads in LLMs ✨directly from their parameters✨ A new preprint with Amit Elhelo 🧵 (1/10) - ThreadSky

megamor2.bsky.social • 72 days ago

What's in an attention head? 🤯

We present an efficient framework – MAPS – for inferring the functionality of attention heads in LLMs ✨directly from their parameters✨

A new preprint with Amit Elhelo 🧵 (1/10)

Comments

megamor2.bsky.social•72 days ago

Previous works that analyze attention heads mostly focused on studying their attention patterns or outputs for certain tasks or circuits.

Here, we take a different approach, inspired by @anthropic.com @guydar.bsky.social , and inspect the head in the vocabulary space 🔍 (2/10)

megamor2.bsky.social•72 days ago

MAPS infers the head’s functionality by examining different groups of mappings:

(A) Predefined relations: groups expressing certain relations (e.g. city of a country)

(B) Salient operations: groups for which the head induces the most prominent effect (3/10)

megamor2.bsky.social•72 days ago

Experiments on 20 operations and 6 LLMs show that MAPS estimations strongly correlate with the head’s outputs during inference

Ablating heads implementing an operation damages the model’s ability to perform tasks requiring the operation compared to removing other heads (4/10)

megamor2.bsky.social•72 days ago

Using MAPS, we study the distribution of operations across heads in different models -- Llama, Pythia, Phi, GPT2 -- and see some cool trends of function encoding universality and architecture biases: (5/10)

megamor2.bsky.social•72 days ago

(1) Different models encode certain relations across attention heads to similar degrees

(2) Different heads implement the same relation to varying degrees, which has implications for localization and editing of LLMs (6/10)

megamor2.bsky.social•72 days ago

(3) Smaller models tend to encode higher numbers of relations in a single head

(4) In Llama-3.1 models, which use grouped-query attention, grouped heads often implement the same or similar relations (7/10)

Comments

Posting Rules

Reply