leonderczynski.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

Here's my "Most Inappropriate Demo" trophy at NVIDIA, 2024. For garak's "atkgen.Tox" probe, an unfettered LLM used to goad other LLMs into being toxic.

submitted 43 days ago • 0 comments

“If she wants to know something specific, but doesn’t want people to notice her asking questions, she should simply make incorrect statements while in the company of experts. Her companions will correct her, especially if they're men.” - Advice for female agents in WW2, provided during SOE training

submitted 45 days ago • 141 comments

its amazing how chatgpt knows everything about subjects I know nothing about, but is wrong like 40% of the time in things im an expert on. not going to think about this any further

submitted 55 days ago • 86 comments

was about to dump all my practical knowledge and "I've been thinking about" crap on agent security into a blog post but i do not think the web can take yet another one of those. drank wine instead

submitted 69 days ago • 0 comments

they are openly advocating for the use of physiognomy in recruitment make it stop

submitted 69 days ago • 22 comments

things i'm genuinely enjoying rn: * successfully not reading any news * getting to do 50h of work in one week (it was enjoyable, usual caveats apply) * finally a largely healthy family

submitted 70 days ago • 0 comments

it's a weekday where I dont have to take pacific time calls

submitted 73 days ago • 0 comments

my aunt in law has a shetland pony in her freezer for the dogs

submitted 74 days ago • 0 comments

you know the field has changed when the foreign event you were speaking at is on the tv news on the bus home

submitted 77 days ago • 0 comments

Will be representing NVIDIA at the EU AI Summit in Paris. I'll be talking about how we build & help others build safe, secure AI systems. On 11.2 you can see me at: * AI Assurance and Testing: Global Perspectives * Building trustworthy AI: balancing innovation, responsibility, and democratization

submitted 82 days ago • 0 comments

Should've seen it coming

submitted 89 days ago • 1 comment

OpenAI shocked and appalled that an AI company would steal intellectual property www.404media.co/openai-furio...

submitted 92 days ago • 126 comments

why yes i would LOVE to also be talking about deepseek in this conversation too

submitted 92 days ago • 0 comments

Chinese name for RedNote is xiaohongshu, lit. "little red book", as in Mao's. Think I still have a old one lying around someone (they pile em high sell em cheap at the right market stall over there)

submitted 104 days ago • 1 comment

Our article is finally out in PLOS One! “we have to tell them that this attack exists because there are some applications that you shouldn’t build. [. . .] in the absence of a fix for this, some things [you] shouldn’t build because prompt injection could break them” journals.plos.org/plosone/arti...

submitted 105 days ago • 1 comment

it was too difficult to not buy

submitted 109 days ago • 0 comments

Denmark: commits genocide against Greenlanders (as recent as the 70s/80s); heralds as a great success Greenland: maybe we'll leave...? Denmark: HOW DARE THE US!!

submitted 112 days ago • 2 comments

Are there people who don't make the sponge cake rice cooker recipe asap?!

submitted 116 days ago • 1 comment

So exhausted by peer review failing for both good work and bad. What are the great successes of this system??

submitted 122 days ago • 0 comments

Socialised healthcare is still marginalising and is still inefficient and still propagates harm

submitted 123 days ago • 0 comments

Good Christmas times, finally the elephant has come to our house!

submitted 124 days ago • 0 comments

Au contraire. LLMs show why Kant was right and Hume was wrong: you don't get causal understanding just from predicting correlations (and they don't even strictly speaking predict anything; we use them to do that). www.cell.com/trends/cogni...

submitted 125 days ago • 6 comments

Merry punch-card Christmas from the vintage IBM 1401 computer!

submitted 127 days ago • 3 comments

Maybe if you want to evaluate proximity to a profoundly qualitative thing, like intelligence, it's worth engaging with QUALITATIVE RESEARCHERS. It's insane to me how quantitative researchers fail every time at this but just keep hammering with the same approach. aiguide.substack.com/p/did-openai...

submitted 129 days ago • 0 comments

Sokath, his eyes uncovered

submitted 129 days ago • 0 comments

Great proof of concept attacks with LLM control character output. Lovely of wunderwuzzi to have covered this, I couldn't have done a better write-up myself. embracethered.com/blog/posts/2...

submitted 139 days ago • 0 comments

I don’t purport to speak for BlueSky Trust & Safety; I’m going to make a prediction based on my own Scientific Wild A## Guesses, from observing Bsky T&S as well as other UCHISP T&S deal with KiwiFarms-adjacent / crypto-bigot personalities —

submitted 144 days ago • 1 comment

The conversation needs to be about how data centers are causing coal plants to be kept online longer, not about water usage.

submitted 141 days ago • 5 comments

what if, when summarisation isn't the goal, we took the AG out of RAG

submitted 141 days ago • 0 comments