📄👀: cross-modal information flow in multimodal large language models neat interpretability work on how visual and linguistic information is integrated in MLLMs: "the model first transfers the more general visual features of the whole image into the representations of (linguistic) question tokens." - ThreadSky

rdhawkins.bsky.social • 88 days ago

📄👀: cross-modal information flow in multimodal large language models

neat interpretability work on how visual and linguistic information is integrated in MLLMs: "the model first transfers the more general visual features of the whole image into the representations of (linguistic) question tokens."

Comments

Posting Rules

Comments

Posting Rules

Reply