allafarce.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

still thinking about this 4 years later

submitted 3 days ago • 1 comment

One of the less common situations where Claude failed me. Luckily Gemini worked like a charm. Scan Costco receipt, grab the prices for select items, sum, divide by 2 (Unsurprisingly this is splitting our Costco run)

submitted 4 days ago • 1 comment

If you worked for 18F and got fired, Group together to start a consulting company. It’s just a matter of time before DOGE needs you to fix the mess they inevitably create. They will have to hire your company as a contractor to fix it. But on your terms. I’m happy to invest and/or help

submitted 4 days ago • 3516 comments

I'm generally a fan of the "abundance" idea. When I distill it, it seems to me a big part of its state capacity sub-piece could be stated as: government is instrumental to outcomes (namely broad abundance); markets are also instrumental to outcomes; means should be flexible

submitted 5 days ago • 0 comments

Some super smart and super evil private equity firm is going to hire all of the best people being laid off from the US government and then they are going to charge a lot of money for the services that we later realize we need.

submitted 5 days ago • 7 comments

My new book Cafe Stories: Mysteries has just been released on Amazon at amzn.to/4h5xNCk

submitted 5 days ago • 1 comment

I published part 2 of our blog posts on how we're building an eval suite to test LLM performance on SNAP issues

submitted 6 days ago • 0 comments

Wow, Promptfoo is a killer tool for LLM evals.

submitted 13 days ago • 0 comments

Have you found or do you have any recommendable work that uses genAI/LLM for policy /social science research? Self-promotion is welcome!

submitted 14 days ago • 3 comments

Software becomes a complex system the second it has human users

submitted 14 days ago • 1 comment

The problem I fear DOGE is about to encounter is that, while there's certainly a fair bit of bullshit involved in federal govt technology, there's two distinct categories: 1. Run of the mill bullshit, and 2. Load-bearing bullshit And it's not that easy to distinguish the two.

submitted 16 days ago • 1 comment

Yet another joyful moment of using LLMs to quickly do small things: I wrote an extremely simple Chrome extension that will redirect me away from web sites I unconsciously go to, and instead open an Obsidian page to log that, notice that it happened, and take a moment to think why

submitted 18 days ago • 3 comments

Update: Felix Salmon (in comments to my post) has much more detail on what apparently happened with the $80 million FEMA funds.

submitted 20 days ago • 39 comments

I appreciate Marina speaking very openly about what a lot of us have felt (absolutely including myself) https://reason.com/2025/02/13/i-tried-to-fix-government-tech-for-years-im-fed-up/

submitted 21 days ago • 1 comment

Maybe I really should start a “Cooking with Claude” blog or something. I’ve never browned butter (doing sage) and so I… just show it pictures and let it tell me if it’s browned yet.

submitted 24 days ago • 0 comments

I am far enough in my engineer-brained journey of learning to cook well that when I find a recipe in a cookbook that is clearly missing acid I get outraged, as if I’ve encountered an elaborate fraud scheme fleecing thousands.

submitted 25 days ago • 2 comments

Our senior dog has diabetes and her continuous glucose monitor alarm was going off 1-3x every hour, all night for the last 3 nights, showing extreme low blood sugar. Turns out the CGM was broken. Everyone has lost in this situation.

submitted 26 days ago • 1 comment

Whenever someone makes a confident claim about how LLMs are failing to answer or do something correctly, I run the mental exercise of how you would ensure correctness if it were humans instead, and 95% or the time that strategy works for a system of LLM calls.

submitted 27 days ago • 1 comment

this might be a dog catches the car moment. because if you can't find fraud in the Medicare and Medicaid programs, that's really going to raise questions of competence. but if you do find fraud in the Medicare and Medicaid programs, you trigger some really mean and well resourced enemies.

submitted 29 days ago • 13 comments

What are the best writeups you've seen on domain-specific evals for LLMs? cc @simonwillison.net (I've got something cool to show soon!)

submitted 28 days ago • 0 comments

Been waiting for someone to test this and see if it works - can multiple AI agents fact-checking each other reduce hallucinations? The answer appears to be yes - using 3 agents with a structured review process reduced hallucination scores by 96% across 310 test cases. arxiv.org/pdf/2501.13946

submitted 32 days ago • 8 comments

I earnestly believe one of the best ways to assess AI models is to use them to augment your cooking. Why? - Low risk - Can add value at almost any level of cooking ability - If you know cooking well, you can see where it does well or falls down (and how specifically) I use Claude daily for this.

submitted 33 days ago • 0 comments

Since someone (@himself.bsky.social?) thinks my scattered mutterings on the US administrative state are worthwhile to some, I'll note one thing I'm watching right now: If the personnel actions' escalation path is the courts, will this Admin quickly hit a capacity constraint on lawyers to defend?

submitted 33 days ago • 7 comments

Some mass-follow catalyst happened today for me. Not sure what it is but this is a fairly well targeted way for someone to help me find out with a reply.

submitted 33 days ago • 5 comments

Possibly the best use of large language models I've found yet: Give it a recipe, and ask to really dial it in/make suggestions on unwritten steps that would make it much tastier (or healthier, or sating)

submitted 37 days ago • 0 comments

If you are 1) a federal civil servant/contractor ordered not to release scheduled data or reports, or 2) an academic who has lost access to government data in the past week ...please get in touch. I'm happy to keep our conversations confidential. crampell[@]washpost[dot]com

submitted 40 days ago • 42 comments

The Fifth Risk was a hell of a book.

submitted 38 days ago • 0 comments

Intellectual honesty is a high individual virtue. But a frustrating reality people who exhibit this must overcome to effect change with the world is not that other people are dishonest really, but that systems are inevitably dishonest with themselves.

submitted 45 days ago • 0 comments

A few observations: - Claude is an amazing dietician - I could absolutely use the help of a dietician - I would almost definitely never go to see a dietician as such today, but am getting 95% of what I need via Claude

submitted 48 days ago • 1 comment

lol, Claude, you ham you.

submitted 60 days ago • 0 comments

The more random things I test using AI for, the more I feel it...

submitted 61 days ago • 1 comment

Google Gemini Deep Research, first go of it! And it... just rejects my prompt. Well then. (It must have had to do with the mention of government safety net benefits / SNAP / Medicaid, but, boy, is that an aggressive safety filter.)

submitted 61 days ago • 0 comments

Government is instrumental, not an end. Companies are instrumental, not an end. Organizations are instrumental, not an end. Markets are instrumental, not an end.

submitted 62 days ago • 0 comments

Claude: "I actually diverge from pure GTD here..." Me: "Can you say more about how you developed that perspective?" (I really enjoyed this exchange — and it's gesturing towards something interesting more generally.)

submitted 63 days ago • 0 comments

Concise version: "start where you are" means (a) accepting the limitations of wherever you're starting in the work (b) doing whatever DOES NOT REQUIRE SOMEONE ELSE to do something (IMPORTANT) (c) bootstrapping from (b) and repeating (a)

submitted 65 days ago • 2 comments