(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @tomerullman.bsky.social and @jennhu.bsky.social, to appear at #NAACL2025! 🐟
We want models that match our values...but could this hurt their diversity of thought?
Preprint: https://arxiv.org/abs/2411.04427
We want models that match our values...but could this hurt their diversity of thought?
Preprint: https://arxiv.org/abs/2411.04427
Comments
* no model reaches human-like diversity of thought.
* aligned models show LESS conceptual diversity than instruction fine-tuned counterparts