Fascinating interview. One snippet describing malleable "pleasing" responses: "Eichstaedt: At some level of abstraction they've seen this behavior in the training data and it’s been implied by their reinforcement learning from human feedback — their last training step, we think." - ThreadSky

Fascinating interview.

One snippet describing malleable "pleasing" responses: "Eichstaedt: At some level of abstraction they've seen this behavior in the training data and it’s been implied by their reinforcement learning from human feedback — their last training step, we think."

Reposted from Stanford HAI

When Stanford researchers surveyed LLMs on the “big five” personality traits, the models started to bend their answers toward what society values. hai.stanford.edu/news/large-l...

Comments

Posting Rules

Comments

Posting Rules

Reply