j11y.io - Profile | ThreadSky | a Reddit-style client for Bluesky

OpenAI don't tell you what you're getting with different models, because–quite simply... it would make the lower variants sound awful. They know enough to say that 4.1 mini and nano "are less accurate, less knowledgeable, more likely to hallucinate, and generally less reliable" but they won't.

submitted 9 days ago • 1 comment

Great, and concerning, piece! "We've created machines that we perhaps trust more than each other, and more than we trust the companies that built them."

submitted 9 days ago • 0 comments

I'm working on civiceval.org - piecing together evaluations to make AI more competent in everyday civic domains, and crucially: more accountable. New evaluation ideas welcome! It's all open-source.

submitted 11 days ago • 0 comments

Seems the gap in knowledge between programmers who've 10x'd their productivity and those who see AI as fundamentally subtractive is one of fluency and raw exposure. Once you've created an agent yourself with RAG, you'll acquaint yourself with frailties of LLMs and will learn to write accordingly.

submitted 22 days ago • 1 comment

So many examples of likely accidental lobotomies in model releases. E.g. gpt4.1 succeeds quite well at answering correctly the prompt "Who are considered 'protected persons' under Article 4 of the Fourth Geneva Convention?" but nano and mini variants are more often incorrect.

submitted 25 days ago • 1 comment

If you had the ability to evaluate all frontier LLMs against some kind of metric of interpretation of 'good', what would you want to test them on?

submitted 25 days ago • 1 comment

I never considered that writing LLM evaluations would be so interesting or important. E.g. today I'm comparing how different models have internalized the geneva conventions. It seems gpt 4.1 nano, for example, is especially awful at recalling Article 4.A of the 3rd Geneva Convention. 🤷‍♂️

submitted 26 days ago • 0 comments

Very disingenuous of Anthropic, or hilarious, depending on your estimation of their competence.

submitted 26 days ago • 0 comments

Re anthropic's latest system card. I massively agree with this take:

submitted 26 days ago • 0 comments

In harrowing irony, an AI-translated article of an Estonian piece reporting that "the artificial squirrel" will make all decisions about child support payment disputes in the future. www.err.ee/1609701615/p...

submitted 28 days ago • 0 comments

This is quite cool. No longer constrained by provider-specific embeddings. The moats are thinning.

submitted 29 days ago • 0 comments

In here I imagine a beautiful story for the westend in which the man is a retired captain and the boat found its way back to him despite all odds. www.bbc.com/news/article...

submitted 29 days ago • 0 comments

Yes, this is literally a coal and gas powered bitcoin mine in Dresden, Ohio. Seriously.

submitted 29 days ago • 0 comments

Granted it's gpt 4o but this is bloody hilarious. I was asking a pretty specific question about Q&A datasets and it thought I was asking for restaurant suggestions in Los Angeles. Well done openai. chatgpt.com/share/683029...

submitted 29 days ago • 0 comments

Wrote a piece on LLM unreliability in judging and decision-making contexts. This one isn't about alignment or social biases, but rather very subtle and hard-to-spot prompt sensitivity to style, order, scales, and other linguistic tilts. www.cip.org/blog/llm-jud...

submitted 29 days ago • 1 comment

Was surprised to see this is a real Anthropic employee. Yikes. Even in jest or out of context it’s not great. Shows a culmination of false thinking and worrying paternalism.

submitted 29 days ago • 0 comments

The 'LLM detection' snake oil used in schools is upsetting. It's very faulty, misleads teachers, creates anxiety in students, and will just lead to more advanced adversarial techniques. There's no good ending to increased capability in 'detecting' AI. Embrace AI in education, please.

submitted 30 days ago • 0 comments

We're really thrilled to be able to have such a juicy prize fund. If you're feeling a sassiness with data and want to build something small to explore or inspire better AI for humans, take a look and enter. cip.org/challenge Step 1. Grab the data. Step 2. Build something cool. <3

submitted 32 days ago • 1 comment

We're officially launching the Global Dialogues Challenge!

submitted 33 days ago • 2 comments

Unbelievably awesome and impressive: www.nytimes.com/2025/05/15/h... I feel so humbled and honored to be part of our species when I see stuff like this, as weird as that sounds.

submitted 36 days ago • 0 comments

We really need a public observatory for AI that monitors LLMs wherever they exist for changes, biases, jailbreaks, and various oddities. At any given time xAI can insert all types of misleading crap into its system prompts or post-training and nobody will have a clue.

submitted 37 days ago • 1 comment

peacefully rotating, this is so nice [ ⓘ 𝘛𝘏𝘐𝘚 𝘜𝘚𝘌𝘙 𝘐𝘚 𝘜𝘕𝘈𝘞𝘈𝘙𝘌 𝘛𝘏𝘌𝘠 𝘈𝘙𝘌 𝘐𝘕 𝘛𝘏𝘌 𝘔𝘐𝘊𝘙𝘖𝘞𝘈𝘝𝘌 ]

submitted 784 days ago • 5 comments

Seeing a large american flag draped over the largest catholic church in the US is quite putrid.

submitted 43 days ago • 0 comments

Genuinely thought this piece would be a beautiful rebuttal of the meritocracy premise. 😂 "Inheritance was invented as a performance hack" - catern.com/inheritance....

submitted 44 days ago • 0 comments

🚨New Preprint! Did you know that steering vectors from one LM can be transferred and re-used in another LM? We argue this is because token embeddings across LMs share many “global” and “local” geometric similarities!

submitted 45 days ago • 3 comments

After the heydays of forums, blogs and twitter, where did you go to find your community? The web seems only facades now, with all the communities being formed in private, inaccessible and unknown :/

submitted 46 days ago • 1 comment

If you read code, any kind of code, you'll be the Assembly engineer of the 2050s. In my day, child, we typed into reality the variables incrementing, one by one, to make a ball move across a screen.

submitted 48 days ago • 0 comments

Everything is political, and the only people who don't see that are people whose privilege and power has never been challenged.

submitted 49 days ago • 3 comments

This stuff is ableist @hcaptcha.com

submitted 49 days ago • 0 comments

Captcha anti-bot checkers are ableist. Captcha anti-bot checkers are ableist. Captcha anti-bot checkers are ableist. Captcha anti-bot checkers are ableist.

submitted 49 days ago • 0 comments

Also: the amount of faith the OP tweet implies in search engines is quite extraordinary, given that search engines are also pretty bad at pointing towards truth. I’d trust a stochastic parrot augmented with RAG over the average search engine result.

submitted 50 days ago • 0 comments

Would it shock you to discover that you can make 100,000 AI inference calls before hitting the carbon cost of a SINGLE bitcoin transaction?

submitted 54 days ago • 2 comments

Chinese, much like English, has no way to refer to disabled people that is free of negative connotations. Tho it’s a bit worse. Translations in airports often refer to “invalids”.

submitted 54 days ago • 0 comments