This corresponds to our observations (in a different setting) of vocabulary collapse when models trained on their own outputs (basically all of RLHF)
https://bsky.app/profile/yoavartzi.com/post/3l6zvosdumm2i
Did you look at pre-post-training models?
(show some hyphen love ❤️)
https://bsky.app/profile/yoavartzi.com/post/3l6zvosdumm2i
Did you look at pre-post-training models?
(show some hyphen love ❤️)
Reposted from
Yoav Artzi
New paper!
Models that learn from feedback train on their own outputs, so you see performance 📈 but language diversity 📉. We show that if you couple comprehension and generation you learn faster 🏎️ AND get richer language!
arxiv.org/abs/2408.15992
Demo and video ⬇ + in EMNLP!
Models that learn from feedback train on their own outputs, so you see performance 📈 but language diversity 📉. We show that if you couple comprehension and generation you learn faster 🏎️ AND get richer language!
arxiv.org/abs/2408.15992
Demo and video ⬇ + in EMNLP!
Comments