Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models "We introduce a simple strategy that makes refusal behavior controllable at test-time without retraining: the refusal token." arxiv.org/abs/2412.06748 - ThreadSky

iscienceluvr.bsky.social • 78 days ago

Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

"We introduce a simple strategy that makes refusal behavior controllable at test-time without retraining: the refusal token."

https://arxiv.org/abs/2412.06748

Comments

Posting Rules

Comments

Posting Rules

Reply