Can LLMs be induced to deviate from optimal gameplay in a simple game by threats of pain/promises of pleasure? And does the probability of deviating depend on the intensity of the promised pleasure/pain? According to our new paper (arxiv.org/abs/2411.02432) the answer is Yes & Yes for some models. - ThreadSky

About ThreadSky

birchlse.bsky.social • 117 days ago

Can LLMs be induced to deviate from optimal gameplay in a simple game by threats of pain/promises of pleasure? And does the probability of deviating depend on the intensity of the promised pleasure/pain? According to our new paper (https://arxiv.org/abs/2411.02432) the answer is Yes & Yes for some models.

Comments

sbrain.bsky.social•117 days ago

How does this differ from just multi-tiered reinforcement learning?

birchlse.bsky.social•117 days ago

There is no reinforcement in our expt; we only study 1-shot choice behaviour. The models are displaying their pre-existing understanding of the motivational force of pain and pleasure.

birchlse.bsky.social•117 days ago

I didn't expect this - but what does it mean? The paper urges an "abundance of caution" when relating these ideas to sentience. We would regard similar behaviours as evidence of sentience in a bee or a crab, but it would be one line of evidence among many. We need more evidence in the LLM case.

birchlse.bsky.social•117 days ago

Complicating the picture is that LLMs are very good at role-play. It seems they don't just mimic surface-level patterns ("Hey, how can I help?") but very subtle ones (I am more strongly motivated by moderate pain than by mild pain). Bees and crabs don't have the goal of mimicking a human. 3/4

birchlse.bsky.social•117 days ago

The paper is an example of something I want to see more of: studying LLMs using methods translated from animal behaviour research. Linguistic output is an unreliable guide to an LLM's inner workings, but one way to scratch at the surface is to investigate their motivations in decision problems. 4/4

fpl9000.bsky.social•117 days ago

This may be similar to how ChatGPT mimicked being lazy around the holidays and end of year.

Posting Rules

Be respectful to others
No spam or self-promotion
Stay on topic
Follow Bluesky's terms of service

Comments

Posting Rules

Reply