Anthropic's "Towards Sycophancy In Language Models" arxiv.org/pdf/2310.13548 TLDR: LLMs tend to generate sycophantic responses. Human feedback & preference models encourage this behavior. I also think this is just the nature of training on internet writing.... We write in social clusters: - ThreadSky

Anthropic's "Towards Sycophancy In Language Models" https://arxiv.org/pdf/2310.13548

TLDR: LLMs tend to generate sycophantic responses.
Human feedback & preference models encourage this behavior.

I also think this is just the nature of training on internet writing.... We write in social clusters:

Comments

Posting Rules

Comments

Posting Rules

Reply