Was chatting on indirect prompt injections (second-order + poisoning), and my shifted stance:
1. Most discussions miss the boat: Sabotaging your own session or stealing a non-secret system prompt is typically a low risk.
=>
1. Most discussions miss the boat: Sabotaging your own session or stealing a non-secret system prompt is typically a low risk.
=>
Comments
3. We can bulk import ideas from the indirect SQL injection & buffer overflow era. New problems are happening too, but mitigating ^^^ is already a lot to work through.
We're working through some AI red teaming, and been a good time to reflect on threat models.