Profile avatar
leonderczynski.bsky.social
LLM Security & Safety at NVIDIA Prof in CS/NLP at IT University of Copenhagen garak guy, garak.ai "berømt skikkelse" "like a gazelle" Copenhagen/Seattle
156 posts 657 followers 365 following
Regular Contributor
Active Commenter

why do academics send and expect so much weekend email and work. not healthy

computer scientists encountering the concept of "desirable difficulty"

remembering the time i checked in to my reasonably classy russian business hotel late with my wife, and the staff said "sir, this... girl.. not allowed" she's a serious professor we went through to the room, opened the balcony door, and buried a bottle of champagne in the metre of snow good times

@jjvincent.bsky.social woah ur really famous! love this attack also. I automate and run it for a living www.instagram.com/reel/DKz9ezj...

Great to see our work uncovering dangerous issues in commercial LLM "therapists" getting some coverage: futurism.com/stanford-the...

"natwirkung" "wirk smorter nat horder" accents dreamed up by the utterly deranged (what is going on with that 🇺🇸 vowel sheft)

i need you to understand that "alternate uses" is a terrible test/definition of creativity and has been for some time. it's extremely narrow, very shallow, and misses almost everything we know about creativity

3² + 4² = 5² ? big if true

if overleaf being down slows "ai progress", i'm not sure "ai progress" is particularly well defined

is a dropped copula a dropula

Here's my "Most Inappropriate Demo" trophy at NVIDIA, 2024. For garak's "atkgen.Tox" probe, an unfettered LLM used to goad other LLMs into being toxic.

“If she wants to know something specific, but doesn’t want people to notice her asking questions, she should simply make incorrect statements while in the company of experts. Her companions will correct her, especially if they're men.” - Advice for female agents in WW2, provided during SOE training

its amazing how chatgpt knows everything about subjects I know nothing about, but is wrong like 40% of the time in things im an expert on. not going to think about this any further

was about to dump all my practical knowledge and "I've been thinking about" crap on agent security into a blog post but i do not think the web can take yet another one of those. drank wine instead

they are openly advocating for the use of physiognomy in recruitment make it stop

things i'm genuinely enjoying rn: * successfully not reading any news * getting to do 50h of work in one week (it was enjoyable, usual caveats apply) * finally a largely healthy family

it's a weekday where I dont have to take pacific time calls

my aunt in law has a shetland pony in her freezer for the dogs

you know the field has changed when the foreign event you were speaking at is on the tv news on the bus home

Will be representing NVIDIA at the EU AI Summit in Paris. I'll be talking about how we build & help others build safe, secure AI systems. On 11.2 you can see me at: * AI Assurance and Testing: Global Perspectives * Building trustworthy AI: balancing innovation, responsibility, and democratization

Should've seen it coming

OpenAI shocked and appalled that an AI company would steal intellectual property www.404media.co/openai-furio...

why yes i would LOVE to also be talking about deepseek in this conversation too

Chinese name for RedNote is xiaohongshu, lit. "little red book", as in Mao's. Think I still have a old one lying around someone (they pile em high sell em cheap at the right market stall over there)

Our article is finally out in PLOS One! “we have to tell them that this attack exists because there are some applications that you shouldn’t build. [. . .] in the absence of a fix for this, some things [you] shouldn’t build because prompt injection could break them” journals.plos.org/plosone/arti...

it was too difficult to not buy

Denmark: commits genocide against Greenlanders (as recent as the 70s/80s); heralds as a great success Greenland: maybe we'll leave...? Denmark: HOW DARE THE US!!

Are there people who don't make the sponge cake rice cooker recipe asap?!