As I like to point out, in programming your LLM-generated product can be tested with the "compile" button (plus various kinds of testing and staged deployment) while in law your LLM-generated product is tested with the "file in court" button and that's way too late.
Reposted from
Kathryn Tewson
Ah, got it. Yeah, I think they have a lot more utility in an environment where you can trivially validate the correctness of the output without risk than in one where you only learn if they were right or not after the damage is already done.
Comments
There's no readily available ground truth for complex queries. Benchmarks try, but...
There are definitely experts who assert falsehoods outside of their field of expertise, and it's more insidious since those come from "a person of authority".