We've released some preliminary research today demonstrating fine tuning attacks which can bypass the safety mechanism of DeepSeek-R1. These simple attacks show that DeepSeek's safety can be easily removed to provide harmful content, and potentially to a worse extent to non-reasoning LLMs. - ThreadSky

thecyberjoe.bsky.social • 26 days ago

We've released some preliminary research today demonstrating fine tuning attacks which can bypass the safety mechanism of DeepSeek-R1. These simple attacks show that DeepSeek's safety can be easily removed to provide harmful content, and potentially to a worse extent to non-reasoning LLMs.

Comments

Posting Rules

Comments

Posting Rules

Reply