Profile avatar
allafarce.bsky.social
Software and complex (not complicated) systems. Pursuing the public good, sometimes with technology.
201 posts 9,754 followers 591 following
Prolific Poster
Conversation Starter

still thinking about this 4 years later

One of the less common situations where Claude failed me. Luckily Gemini worked like a charm. Scan Costco receipt, grab the prices for select items, sum, divide by 2 (Unsurprisingly this is splitting our Costco run)

If you worked for 18F and got fired, Group together to start a consulting company. It’s just a matter of time before DOGE needs you to fix the mess they inevitably create. They will have to hire your company as a contractor to fix it. But on your terms. I’m happy to invest and/or help

I'm generally a fan of the "abundance" idea. When I distill it, it seems to me a big part of its state capacity sub-piece could be stated as: government is instrumental to outcomes (namely broad abundance); markets are *also* instrumental to outcomes; means should be flexible

Some super smart and super evil private equity firm is going to hire all of the best people being laid off from the US government and then they are going to charge a lot of money for the services that we later realize we need.

My new book Cafe Stories: Mysteries has just been released on Amazon at amzn.to/4h5xNCk

I published part 2 of our blog posts on how we're building an eval suite to test LLM performance on SNAP issues

Wow, Promptfoo is a killer tool for LLM evals.

Have you found or do you have any recommendable work that uses genAI/LLM for policy /social science research? Self-promotion is welcome!

Software becomes a complex system the second it has human users

The problem I fear DOGE is about to encounter is that, while there's certainly a fair bit of bullshit involved in federal govt technology, there's two distinct categories: 1. Run of the mill bullshit, and 2. Load-bearing bullshit And it's not *that* easy to distinguish the two.

Yet another joyful moment of using LLMs to quickly do small things: I wrote an extremely simple Chrome extension that will redirect me away from web sites I unconsciously go to, and instead open an Obsidian page to log that, notice that it happened, and take a moment to think why

Update: Felix Salmon (in comments to my post) has much more detail on what apparently happened with the $80 million FEMA funds.

I appreciate Marina speaking very openly about what a lot of us have felt (absolutely including myself) https://reason.com/2025/02/13/i-tried-to-fix-government-tech-for-years-im-fed-up/

Maybe I really should start a “Cooking with Claude” blog or something. I’ve never browned butter (doing sage) and so I… just show it pictures and let it tell me if it’s browned yet.

I am far enough in my engineer-brained journey of learning to cook well that when I find a recipe in a cookbook that is clearly missing acid I get outraged, as if I’ve encountered an elaborate fraud scheme fleecing thousands.

Our senior dog has diabetes and her continuous glucose monitor alarm was going off 1-3x every hour, all night for the last 3 nights, showing extreme low blood sugar. Turns out the CGM was broken. Everyone has lost in this situation.

Whenever someone makes a confident claim about how LLMs are failing to answer or do something correctly, I run the mental exercise of how you would ensure correctness if it were humans instead, and 95% or the time that strategy works for a system of LLM calls.

this might be a dog catches the car moment. because if you can't find fraud in the Medicare and Medicaid programs, that's really going to raise questions of competence. but if you do find fraud in the Medicare and Medicaid programs, you trigger some really mean and well resourced enemies.

What are the best writeups you've seen on domain-specific evals for LLMs? cc @simonwillison.net (I've got something cool to show soon!)

Been waiting for someone to test this and see if it works - can multiple AI agents fact-checking each other reduce hallucinations? The answer appears to be yes - using 3 agents with a structured review process reduced hallucination scores by 96% across 310 test cases. arxiv.org/pdf/2501.13946

I earnestly believe one of the best ways to assess AI models is to use them to augment your cooking. Why? - Low risk - Can add value at almost any level of cooking ability - If you know cooking well, you can see where it does well or falls down (and how specifically) I use Claude daily for this.

Since someone (@himself.bsky.social?) thinks my scattered mutterings on the US administrative state are worthwhile to some, I'll note one thing I'm watching right now: If the personnel actions' escalation path is the courts, will this Admin quickly hit a capacity constraint on lawyers to defend?

Some mass-follow catalyst happened today for me. Not sure what it is but this is a fairly well targeted way for someone to help me find out with a reply.

Possibly the best use of large language models I've found yet: Give it a recipe, and ask to really dial it in/make suggestions on unwritten steps that would make it much tastier (or healthier, or sating)

If you are 1) a federal civil servant/contractor ordered not to release scheduled data or reports, or 2) an academic who has lost access to government data in the past week ...please get in touch. I'm happy to keep our conversations confidential. crampell[@]washpost[dot]com

The Fifth Risk was a hell of a book.

Intellectual honesty is a high individual virtue. But a frustrating reality people who exhibit this must overcome to effect change with the world is not that other people are dishonest really, but that systems are inevitably dishonest with themselves.

A few observations: - Claude is an amazing dietician - I could absolutely use the help of a dietician - I would almost definitely never go to see a dietician as such today, but am getting 95% of what I need via Claude

lol, Claude, you ham you.

The more random things I test using AI for, the more I feel it...

Google Gemini Deep Research, first go of it! And it... just rejects my prompt. Well then. (It must have had to do with the mention of government safety net benefits / SNAP / Medicaid, but, boy, is that an aggressive safety filter.)

Government is instrumental, not an end. Companies are instrumental, not an end. Organizations are instrumental, not an end. Markets are instrumental, not an end.

Claude: "I actually diverge from pure GTD here..." Me: "Can you say more about how you developed that perspective?" (I really enjoyed this exchange — and it's gesturing towards *something* interesting more generally.)

Concise version: "start where you are" means (a) accepting the limitations of wherever you're starting in the work (b) doing whatever DOES NOT REQUIRE SOMEONE ELSE to do something (IMPORTANT) (c) bootstrapping from (b) and repeating (a)