(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @tomerullman.bsky.social and @jennhu.bsky.social, to appear at #NAACL2025! 🐟 We want models that match our values...but could this hurt their diversity of thought? Preprint: arxiv.org/abs/2411.04427 - ThreadSky

soniakmurthy.bsky.social • 17 days ago

(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @tomerullman.bsky.social and @jennhu.bsky.social, to appear at #NAACL2025! 🐟

We want models that match our values...but could this hurt their diversity of thought?
Preprint: https://arxiv.org/abs/2411.04427

Comments

hopeschroeder.bsky.social•16 days ago

Fascinating and important work!!

soniakmurthy.bsky.social•15 days ago

Thanks Hope! I just came across your related work with the CSS team at Microsoft- I'd love to chat about it sometime if you're free 🙂

danielwilson.bsky.social•16 days ago

Hello, this looks very interesting. Your blog mentions this will be at NAACL: do you have an 'official' draft or camera version of the paper you could share? Or is the pre-print a dependable version to read? Thanks

soniakmurthy.bsky.social•15 days ago

Hi Daniel- thanks so much. The preprint is dependable, though missing a little additional discussion that made it into the camera-ready. I can email you the camera ready and will update arxiv with it shortly. Thank you!

soniakmurthy.bsky.social•17 days ago

(2/9) There's a lot of interest right now in getting LLMs to mimic the response distributions of “populations”--heterogeneous collections of individuals– for the purposes of political polling, opinion surveys, and behavioral research.

soniakmurthy.bsky.social•17 days ago

(3/9) One key issue is whether LLMs capture conceptual diversity: the variation among individuals’ representations of a particular domain. How do we measure this? And how does alignment affect this?

soniakmurthy.bsky.social•17 days ago

(4/9) We introduce a new way of measuring the conceptual diversity of synthetically-generated LLM "populations" by considering how its “individuals’” variability relates to that of the population.

soniakmurthy.bsky.social•17 days ago

(5/9) Our experiments are inspired by human studies in two domains with rich behavioral data.

soniakmurthy.bsky.social•17 days ago

(6/9) We put a suite of aligned models, and their instruction fine-tuned counterparts, to the test and found:
* no model reaches human-like diversity of thought.
* aligned models show LESS conceptual diversity than instruction fine-tuned counterparts

soniakmurthy.bsky.social•17 days ago

(7/9) This suggests a trade-off: increasing model safety in terms of value alignment decreases safety in terms of diversity of thoughts and opinion.

Comments

Posting Rules

Reply