This corresponds to our observations (in a different setting) of vocabulary collapse when models trained on their own outputs (basically all of RLHF) bsky.app/profile/yoav... Did you look at pre-post-training models? (show some hyphen love ❤️) - ThreadSky

This corresponds to our observations (in a different setting) of vocabulary collapse when models trained on their own outputs (basically all of RLHF)
https://bsky.app/profile/yoavartzi.com/post/3l6zvosdumm2i

Did you look at pre-post-training models?
(show some hyphen love ❤️)

Reposted from Yoav Artzi

New paper!
Models that learn from feedback train on their own outputs, so you see performance 📈 but language diversity 📉. We show that if you couple comprehension and generation you learn faster 🏎️ AND get richer language!
arxiv.org/abs/2408.15992
Demo and video ⬇ + in EMNLP!

Comments

Posting Rules

Comments

Posting Rules

Reply