Fascinating interview.

One snippet describing malleable "pleasing" responses: "Eichstaedt: At some level of abstraction they've seen this behavior in the training data and it’s been implied by their reinforcement learning from human feedback — their last training step, we think."
Reposted from Stanford HAI
When Stanford researchers surveyed LLMs on the “big five” personality traits, the models started to bend their answers toward what society values. hai.stanford.edu/news/large-l...

Comments