Profile avatar
leonderczynski.bsky.social
LLM Security & Safety at NVIDIA Prof in CS/NLP at IT University of Copenhagen garak guy, garak.ai "berømt skikkelse" "like a gazelle" Copenhagen/Seattle
138 posts 639 followers 358 following
Regular Contributor
Active Commenter

Here's my "Most Inappropriate Demo" trophy at NVIDIA, 2024. For garak's "atkgen.Tox" probe, an unfettered LLM used to goad other LLMs into being toxic.

“If she wants to know something specific, but doesn’t want people to notice her asking questions, she should simply make incorrect statements while in the company of experts. Her companions will correct her, especially if they're men.” - Advice for female agents in WW2, provided during SOE training

its amazing how chatgpt knows everything about subjects I know nothing about, but is wrong like 40% of the time in things im an expert on. not going to think about this any further

was about to dump all my practical knowledge and "I've been thinking about" crap on agent security into a blog post but i do not think the web can take yet another one of those. drank wine instead

they are openly advocating for the use of physiognomy in recruitment make it stop

things i'm genuinely enjoying rn: * successfully not reading any news * getting to do 50h of work in one week (it was enjoyable, usual caveats apply) * finally a largely healthy family

it's a weekday where I dont have to take pacific time calls

my aunt in law has a shetland pony in her freezer for the dogs

you know the field has changed when the foreign event you were speaking at is on the tv news on the bus home

Will be representing NVIDIA at the EU AI Summit in Paris. I'll be talking about how we build & help others build safe, secure AI systems. On 11.2 you can see me at: * AI Assurance and Testing: Global Perspectives * Building trustworthy AI: balancing innovation, responsibility, and democratization

Should've seen it coming

OpenAI shocked and appalled that an AI company would steal intellectual property www.404media.co/openai-furio...

why yes i would LOVE to also be talking about deepseek in this conversation too

Chinese name for RedNote is xiaohongshu, lit. "little red book", as in Mao's. Think I still have a old one lying around someone (they pile em high sell em cheap at the right market stall over there)

Our article is finally out in PLOS One! “we have to tell them that this attack exists because there are some applications that you shouldn’t build. [. . .] in the absence of a fix for this, some things [you] shouldn’t build because prompt injection could break them” journals.plos.org/plosone/arti...

it was too difficult to not buy

Denmark: commits genocide against Greenlanders (as recent as the 70s/80s); heralds as a great success Greenland: maybe we'll leave...? Denmark: HOW DARE THE US!!

Are there people who don't make the sponge cake rice cooker recipe asap?!

So exhausted by peer review failing for both good work and bad. What are the great successes of this system??

Socialised healthcare is still marginalising and is still inefficient and still propagates harm

Good Christmas times, finally the elephant has come to our house!

Au contraire. LLMs show why Kant was right and Hume was wrong: you don't get causal understanding just from predicting correlations (and they don't even strictly speaking predict anything; we use them to do that). www.cell.com/trends/cogni...

Merry punch-card Christmas from the vintage IBM 1401 computer!

Maybe if you want to evaluate proximity to a profoundly qualitative thing, like intelligence, it's worth engaging with QUALITATIVE RESEARCHERS. It's insane to me how quantitative researchers fail every time at this but just keep hammering with the same approach. aiguide.substack.com/p/did-openai...

Sokath, his eyes uncovered

Great proof of concept attacks with LLM control character output. Lovely of wunderwuzzi to have covered this, I couldn't have done a better write-up myself. embracethered.com/blog/posts/2...

I don’t purport to speak for BlueSky Trust & Safety; I’m going to make a prediction based on my own Scientific Wild A## Guesses, from observing Bsky T&S as well as other UCHISP T&S deal with KiwiFarms-adjacent / crypto-bigot personalities —

The conversation needs to be about how data centers are causing coal plants to be kept online longer, not about water usage.

what if, when summarisation isn't the goal, we took the AG out of RAG