Profile avatar
potatolicious.bsky.social
474 posts 145 followers 299 following
Discussion Master
comment in response to post
And yeah, attempts so far at forcing the LLM through a world model are... not great. There's a massive flattening of functionality and long-tail use cases (where LLMs excel!) are compromised. I'm very eagerly awaiting for R&D here.
comment in response to post
Agree on both points. I think there's insufficient evals right now on error distributions. There are comparatively many end-to-end error measures, but these may actually tell us less than we need!
comment in response to post
Case in point: absolutely zero people in this thread disagree on the correct answer to the "counting Rs" problem! There is no epistemological uncertainty, nor theory of mind required to crack this. Just clever prompting and tool use that embraces the fact that this is a big ball 'o probability!
comment in response to post
It overwhelmingly does not. As I said in my initial reply: while LLMs can touch on epistemologically tricky things, the overwhelming use cases (and the overwhelming cases where they are wrong) we have for them do not trigger this in the least.
comment in response to post
I think we're in violent agreement - humans do use many internal programs, but that's completely tangential to the original point that set this off. OP's contention is that responding to LLMs being factually incorrect requires contending with the epistemological nature of "fact".
comment in response to post
"Oh, for success at this task I need to specifically configure and prompt the LLM in a way I'd never do with a human" is doing no. 1! And it's extremely important!
comment in response to post
That's kind of my whole point, which I think is twofold: 1) Harnessing the power of LLMs requires understanding deeply how their error distribution differs from a human. 2) Most of the core blockers to adoption aren't in espistomologically-sensitive things but much simpler things like this.
comment in response to post
But I think this answer actually reinforces their point! You would never ask a human to count the number of "R"s by writing a program. You would just ask them. You *do* do it to the LLM because you implicitly understand that it has a massively different failure distribution than a person!
comment in response to post
When a law associate asks ChatGPT to write a brief and it hallucinates a bunch of citations to things that never existed - that's a practical failure that has almost nothing to do with epistemology!
comment in response to post
Yeah, it's possible to get an LLM to emit output where reasonable human beings would disagree on its truth-value, and as an intellectual exercise that's very interesting. As a practical matter, in terms of exploiting its capabilities, I'd argue it's not relevant in most cases!
comment in response to post
This describes ~most interactions where the LLM is seen to misbehave. Yeah, if you ask it to opine on [insert controversial topic here] we get into the realm where epistemology matters (assuming you get past the guard rails...)
comment in response to post
The LLM is wrong is an extremely obvious way, and any reasonable observer will agree that it was wrong! More importantly, the *reason* it was wrong is not because of some underlying epistemological dispute or lack of consensus! The thing is claims exists, never did!
comment in response to post
Epistemological truth matters for some queries, but not all, and certainly not for the most common examples of LLM failures. Take my example: when a dev asks a LLM for an API and it hallucinates a thing that never existed, in what way does that concern epistemology?
comment in response to post
When a LLM hallucinates something obviously incorrect in a user-facing product, "what is a fact even" and "humans screw up too" are not useful responses - both as a marketing matter and as a practical means of improving these products.
comment in response to post
I think my core disagreement is that epistemology is not (and should not be) central to this exercise at all, and appeals to it lie somewhere between "unhelpful towards practical use" to "actively harmful to adoption".
comment in response to post
Squishing this all down to "both things can be wrong" elides this extremely consequential difference! They screw up, differently! And if you want to harness the *useful* output of a LLM you need to understand this difference in distribution.
comment in response to post
Because it radically changes the distribution of incorrect outputs across the space of all incorrect outputs. The distribution of incorrect outputs for a human is very different than the distribution of incorrect outputs for a LLM. This is trivially observable in daily use!
comment in response to post
I don't think I am. Humans will often *speak nonsense*, but they do it for radically different reasons than LLMs do (peer pressure, in-grouping, etc.) And this distinction matters! Because LLMs "speak nonsense" differently than humans "speak nonsense"!
comment in response to post
Simplifying this down to "humans screw up too" is somehow simultaneously deeply unfair to *both* things that are being compared. Both are fallible, but the distribution of failure is *radically* different!
comment in response to post
And if your thrust is that I should treat Claude the same way I treat a ranting street preacher... that's a real problem for the product! But also, unfair to the LLM! It is useful in ways that the ranting preacher is not!
comment in response to post
Not usually, no? Again, let's not generalize a problem prematurely. Have I *ever* asked another mobile app developer for an API and received an answer that plainly *did not exist and never did*? No.
comment in response to post
Which is not to say we can't or won't make adaptations to LLM-based products to address these - but the way to make this happen is to confront these shortcomings directly, rather than what I feel like is a handwave: humans have problems too (yes, and they're largely different than LLM problems)
comment in response to post
Likewise, we've built entire algorithms around sifting consensus out of noise - PageRank is one of the earliest and still extremely useful. We also present users with lists of references, such that the consensus (or lack thereof) is easily judged.
comment in response to post
We've built a *lot* of infrastructure around the fallibility of human thought, and to signal to other humans what has consensus and what does not. There are wrong answers on Stackoverflow for example! They're also likely to be downranked or have comments below them.
comment in response to post
"What is a fact anyway" and "aren't humans also stochastic parrots" IMO are unhelpful dodges because they elide actual problems that are unique to the tech.
comment in response to post
Woah, let's reset here. I use LLMs daily, I *ship products based on LLMs*. Mine is not a "this is all useless, throw it away" stance. But we need to reckon with the actual limitations of this tech, how it's presented and pitched to users, and what the practical implications are.
comment in response to post
Like, recently I asked Claude for an API to use if I wanted to do a particular thing on Android. It hallucinated a non-existent API that as far as I can tell never existed in any way, shape, or form. That's not an epistemological problem!
comment in response to post
I feel like this whole "what is a fact even" impulse is a dodge. Of course there are questions for which even reasonable humans have different notions of fact - but the LLM screws up even questions where there is consensus! That's a real problem!
comment in response to post
We know that distribution of training data matters - asking for things that are rare in the training set often results in hallucinations - *even if the underlying training data has consensus about what the "fact" is*. No epistemology required - the LLM is just wrong!
comment in response to post
Heavily disagree. Sure, at the edges "fact" is non-obvious, but we don't have to go to the edges for these LLMs to misbehave.
comment in response to post
+1. They don't solve *all* the problems with cars, but it's pretty much 100% upside from the status quo.
comment in response to post
Fair enough! The Grand Challenge model did attract academic participants, but these engagements often rely on existing school funding anyway (the 10% chance of a payoff is icing on the cake and not existential) Which is to say, I have strong doubts about the "lottery" method of funding research...
comment in response to post
The problem with this approach is that the participants are overwhelmingly academia. "10% chance of big payday for bleeding-edge research, 90% chance of nothing" in an academic setting is tolerable risk, in the private sector it means a company goes under.
comment in response to post
This happens already though! The genesis of self-driving cars was the DARPA Grand Challenge. No funding upfront for any players, but prizes for reaching milestones. Where the rubber meets the road is when the government steps *away*.
comment in response to post
This is an area where the millennial-heavy userbase shows itself. Many (like me!) are struggling to buy homes at unprecedented valuations. It's a real problem. But the majority of the population already own, at costs that are maneageable for them. It's an intensely cohort-specific problem.
comment in response to post
Most tech advancements don't change your life in a noticeable way. Products get better, safer, lighter, smaller, cheaper over years through accumulated advancements. Once in a while you get something *really* big (see: smartphones), but most advancement isn't like this.
comment in response to post
Ehhhh disagree a lot with this, as someone with a background in both compsci and robotics. The US is (was?) an absolute powerhouse of the academia -> industry pipeline. Just because a lot of it is quiet doesn't mean it's not there.
comment in response to post
Not just a cost thing but slab can *radically* simply water management. Arguably *many* parts of NJ shouldn't have basements given the ubiquity of wet basements. I can see why even new builds would avoid that added complexity, especially in our geography.
comment in response to post
It used to be you had to come with a well-formed thesis about your product, who it's for, and what it should do. Now you just YOLO some code and hope the analytics reveal some secret truth that absolves you from thinking and analysis.
comment in response to post
I feel this intensely about the tech industry. The advent of universal experimentation and deep analytics has removed the need to know *what* product you're building or indeed *why*. The result is an ocean of crappy products that are hoping someone else tells them how they should work.
comment in response to post
Perhaps less internally repressive, but not more democratic, if that makes any sense?
comment in response to post
The “buying off” thing I think is real but doesn’t result in *democracy* but rather the opposite. Illiberalism is justified because the state provides. See for example: China, especially when economic growth rates were much higher.
comment in response to post
Also you can deliberately exploit the airport service for more ridership! Zürich for example built an entire non-airport-related commercial hub around the airport train station.
comment in response to post
One unifying factor in all of these stories is that the users have *very* long, continuous conversations with the LLM. You aren't likely to replicate this behavior in short conversations of a couple dozen turns.
comment in response to post
The key here is "long". On a technical level, we know that adherence to the initial system prompt declines as the context (i.e., chatlog) gets longer. At extreme lengths a lot of the safeties (to the extent they are really safe in the first place) fail.
comment in response to post
And Nintendo's strategy is harder. You need re-earn cultural relevance with many cohorts of young people over several decades, vs. "hey you played Halo when you were 12. Wanna play some Halo now that you're 36?" The fact that Mario has been culturally relevant *within different cohorts* is a coup!
comment in response to post
This feels like the continuation of an existing trend. YouTube made "making TV" radically cheaper and has vastly altered the content calculus. AI looks like it may be the next step?
comment in response to post
Isn't this already the case with YouTube? Most people who upload don't get any eyeballs. A minority get a consistent but small viewership, while a tiny slice of mega-accounts get millions of views on everything.
comment in response to post
Yeah, but this process is *super* inefficient and the companies have never really gotten serious about understanding how to do it well. I'm sympathetic to "we need to try a ton of different things to see what sticks", but how most FAANGs do it is pretty absurd.
comment in response to post
Weirdo futurists 💁 🦋 any algorithm Is this the singularity?