As I like to point out, in programming your LLM-generated product can be tested with the "compile" button (plus various kinds of testing and staged deployment) while in law your LLM-generated product is tested with the "file in court" button and that's way too late. - ThreadSky

tznkai.bsky.social • 13 days ago

As I like to point out, in programming your LLM-generated product can be tested with the "compile" button (plus various kinds of testing and staged deployment) while in law your LLM-generated product is tested with the "file in court" button and that's way too late.

Reposted from Kathryn Tewson

Ah, got it. Yeah, I think they have a lot more utility in an environment where you can trivially validate the correctness of the output without risk than in one where you only learn if they were right or not after the damage is already done.

Comments

tznkai.bsky.social•13 days ago

Most of the problem still seems to be people not understanding when to hook up *any* unvalidated output to anything. The particular failure modes of LLMs are in many way besides the point once you know that there are any (there's a lot in fact). What do you do with inputs that might have failed?

digifox.binaryden.net•13 days ago

I think the problem is that generative AI makes it hard to realistically reason about failure rates because failures can be partial and can look indistinguishable to the naked eye from success.

There's no readily available ground truth for complex queries. Benchmarks try, but...

tznkai.bsky.social•13 days ago

Okay, but you know who else makes partial mistakes that can look indistinguishable to the naked eye from success?

digifox.binaryden.net•13 days ago

Sure, but the difference is that the types of mistakes generative AI makes are often not the types of mistakes that a human might make, and so we're not as skilled at recognizing them. And the general public probably will never be as skilled at recognizing those failure modes.

digifox.binaryden.net•13 days ago

I guess the problem I have is that we're handing this stuff to people who aren't computer scientists and expecting them to use it responsibly and by and large they don't.

tznkai.bsky.social•13 days ago

Honestly I think the problem as much from some computer scientists think they have invented machine God side as anything else. A lot of people are incentivized to present this as a miracle.

vector-of-bool.bsky.social•13 days ago

This got me thinking about human mistakes, and it seems that any given person, will either (A) hedge "I'm not sure, but I think…", or (B) Leroy Jenkins into saying something totally wrong. Whether a person does (A) or (B) seems intrinsic to their personality. With LLMs exclusively in camp (B).

vector-of-bool.bsky.social•13 days ago

Most people, especially educated, will hedge or refuse to answer questions when they lack confidence.

There are definitely experts who assert falsehoods outside of their field of expertise, and it's more insidious since those come from "a person of authority".

tznkai.bsky.social•13 days ago

This is a question we face constantly in designing machines, sure, but also human systems. We do it in games, and in how we set rules for our kids, and how we train new employees. Mistakes, errors, accidents, failures are a given in life, but somehow we treat the LLM's failures sui generis.

tznkai.bsky.social•13 days ago

(@ed3d.net has written some smart stuff about how to use LLMs iteratively in coding in a productive way and I am sure there are particulars to LLMs that you need know how to handle, but I don't think that's the major contributor to bad thinking about LLMs or so-called AI)

tznkai.bsky.social•13 days ago

I suspect it's because when we think AI we want it to have all of the reasoning ability of a human intelligence but also the mechanical reliability of a calculator of which any computer is an overgrown version of. But the LLM isn't that kind of reliable! (nor that kind of smart)

tznkai.bsky.social•13 days ago

Instead I think you have to treat it as having both a major downside risk of a computers, that of rapidly doing something unmonitored, and a major downside risk of humans, that of nondeterminstic behavior, and set guardrails appropriately. That is, integrating LLMs it into a system.

tznkai.bsky.social•13 days ago

But that's hard, and involves a lot of work, and negotiating between stakeholders and determining business logic and signing your name on a project plan that could fail.

michaelander45.bsky.social•13 days ago

Correct, though unlike human employees the llm system you are paying to deploy does not improve over time from exposure and experience alone.

chbarts.bsky.social•13 days ago

The term "AI" has never been just about human-like cognition. It's always been a grab-bag of techniques and projects, only some of which were aimed at creating human-like software.

thewanderingjew.bsky.social•13 days ago

yeah - and I think aalso, the perception people have is that maybe it doesn't have human intelligence yet, but instead what it has is like, less-effective human thinking.

ed3d.net•13 days ago

I kind of agree with that in logical tasks. Like, when programming, I regularly feel like it is a junior with a lot of energy but a bunch of holes in its thinking.

horton.hearsa.foo•13 days ago

To be fair, testing “do all these citations exist in the legal databases I have access to” is relatively straightforward at least

tznkai.bsky.social•13 days ago

True, but I would be really worried about citing a case for the opposite of its holding which is much harder to automate a test for and at that point maybe you should have just done the substantive work in the first place or at least sent the LLM generation way left on your timeline.

horton.hearsa.foo•13 days ago

(And I mean in an automated way)

Comments

Posting Rules

Reply