j11y.io - Profile | ThreadSky | a Reddit-style client for Bluesky

j11y.io

🏳️‍🌈 j11y.io // author, engineer, stroke survivor, epileptic. I live in Beijing. I build book recs on ablf.io and work on AI governance at @cip.org

328 posts 173 followers 177 following

Posts 33 Comments 17

comment in response to post

They should be more transparent. But then, why would they? Why would an encyclopedia seller tell you that its encyclopedias are only ~half right ~half of the time?

submitted 9 days ago

comment in response to post

Instead you get faux positive framing with different price levels. No allusions to accuracy, general knowledge, other abilities. They just talk about speed and cost.

submitted 9 days ago

comment in response to post

Oh yep no doubt. I agree. To me, the DSM is very damaging and naively misrepresents many underlying traumas and creates arbitrary buckets deemed pathologies. It’s been hugely problematic. I can just imagine some random clinical psychologists having a field day with this AI stuff.

submitted 18 days ago

comment in response to post

Not long before AI related pathologies work their way into the DSM. Scary.

submitted 18 days ago

comment in response to post

There is definitely a fluency piece to this. Like any language. You are not writing in English. You may think you are. You are writing in latent vector space, by happenstance an artefact of a mostly english corpus.

submitted 22 days ago

comment in response to post

One thing I end up doing a lot, because I've played a tonne with these over the years, is to fork off new contexts, change what the agent can "see", and e.g. for another more subtle thing: stay utterly conscious of how I'm "leading on" or enabling LLMs' native sycophancy.

submitted 22 days ago

comment in response to post

System Cards released by labs don't go far enough. They should be running rich cross-cultural knowledge evaluations and sharing blindspots for each model they release. Not doing this means these models will creep into everyday applications with no trigger for implementors to stop and check.

submitted 25 days ago

comment in response to post

I picked the geneva conventions as an example because they're a well trodden piece of training material and quite crucial in the fabric of human society across the planet. We should expect all models to be able to hold this knowledge. Or for them to be transparent in their ignorance.

submitted 25 days ago

comment in response to post

And sure, we can't expect all models to know everything, but they should probably be good at knowing what they don't know. Alas, that knowledge of ignorance is itself a level of insight that many models simply don't have.

submitted 25 days ago

comment in response to post

AI labs allow you to pick their models (standard, mini, nano variants) on fuzzy ratings like intelligence, cost and speed. But you have no idea what you're losing in terms of knowledge integration.

submitted 25 days ago

comment in response to post

Yes, specific legal texts of such weight are something we'd expect fine-tuned and RAG-enabled agents to be used for. But that is an empty hope. The reality is that generic frontier models are increasingly relied upon for truth, and so we need to ensure that they consistently provide it.

submitted 25 days ago

comment in response to post

submitted 25 days ago

comment in response to post

It's literally just them standing together co-wanking.

submitted 29 days ago

comment in response to post

Yes!! You can make something a 100k big param LLM inferences before hitting the carbon cost of just ONE bitcoin transaction, so, yeah, kinda weird people are so fixated.

submitted 44 days ago

comment in response to post

Agreeeeeed!

submitted 46 days ago

comment in response to post

With enough ones and zeros you have CNNs, transformers and then AI that can produce language and meaning. CS seems no longer divorced from fleshy human reality. Sure tho, at a low enough level, the bits flipping aren’t political I guess.

submitted 49 days ago

comment in response to post

Hmm well commenting only the usage I’ve seen of it. It seems to inspire essentially the same issue we had with inline styles we had many years ago, except more muddled under new syntax and classes. Very long class names in aggregate. No separation of content and style.

submitted 54 days ago