potatolicious.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

And yeah, attempts so far at forcing the LLM through a world model are... not great. There's a massive flattening of functionality and long-tail use cases (where LLMs excel!) are compromised. I'm very eagerly awaiting for R&D here.

submitted 18 hours ago

comment in response to post

Agree on both points. I think there's insufficient evals right now on error distributions. There are comparatively many end-to-end error measures, but these may actually tell us less than we need!

submitted 18 hours ago

comment in response to post

Case in point: absolutely zero people in this thread disagree on the correct answer to the "counting Rs" problem! There is no epistemological uncertainty, nor theory of mind required to crack this. Just clever prompting and tool use that embraces the fact that this is a big ball 'o probability!

submitted 21 hours ago

comment in response to post

It overwhelmingly does not. As I said in my initial reply: while LLMs can touch on epistemologically tricky things, the overwhelming use cases (and the overwhelming cases where they are wrong) we have for them do not trigger this in the least.

submitted 21 hours ago

comment in response to post

I think we're in violent agreement - humans do use many internal programs, but that's completely tangential to the original point that set this off. OP's contention is that responding to LLMs being factually incorrect requires contending with the epistemological nature of "fact".

submitted 21 hours ago

comment in response to post

"Oh, for success at this task I need to specifically configure and prompt the LLM in a way I'd never do with a human" is doing no. 1! And it's extremely important!

submitted 1 day ago

comment in response to post

That's kind of my whole point, which I think is twofold: 1) Harnessing the power of LLMs requires understanding deeply how their error distribution differs from a human. 2) Most of the core blockers to adoption aren't in espistomologically-sensitive things but much simpler things like this.

submitted 1 day ago

comment in response to post

But I think this answer actually reinforces their point! You would never ask a human to count the number of "R"s by writing a program. You would just ask them. You *do* do it to the LLM because you implicitly understand that it has a massively different failure distribution than a person!

submitted 1 day ago

comment in response to post

When a law associate asks ChatGPT to write a brief and it hallucinates a bunch of citations to things that never existed - that's a practical failure that has almost nothing to do with epistemology!

submitted 1 day ago

comment in response to post

Yeah, it's possible to get an LLM to emit output where reasonable human beings would disagree on its truth-value, and as an intellectual exercise that's very interesting. As a practical matter, in terms of exploiting its capabilities, I'd argue it's not relevant in most cases!

submitted 1 day ago

comment in response to post

This describes ~most interactions where the LLM is seen to misbehave. Yeah, if you ask it to opine on [insert controversial topic here] we get into the realm where epistemology matters (assuming you get past the guard rails...)

submitted 1 day ago

comment in response to post

The LLM is wrong is an extremely obvious way, and any reasonable observer will agree that it was wrong! More importantly, the *reason* it was wrong is not because of some underlying epistemological dispute or lack of consensus! The thing is claims exists, never did!

submitted 1 day ago

comment in response to post

Epistemological truth matters for some queries, but not all, and certainly not for the most common examples of LLM failures. Take my example: when a dev asks a LLM for an API and it hallucinates a thing that never existed, in what way does that concern epistemology?

submitted 1 day ago

comment in response to post

When a LLM hallucinates something obviously incorrect in a user-facing product, "what is a fact even" and "humans screw up too" are not useful responses - both as a marketing matter and as a practical means of improving these products.

submitted 1 day ago

comment in response to post

I think my core disagreement is that epistemology is not (and should not be) central to this exercise at all, and appeals to it lie somewhere between "unhelpful towards practical use" to "actively harmful to adoption".

submitted 1 day ago

comment in response to post

Squishing this all down to "both things can be wrong" elides this extremely consequential difference! They screw up, differently! And if you want to harness the *useful* output of a LLM you need to understand this difference in distribution.

submitted 1 day ago

comment in response to post

Because it radically changes the distribution of incorrect outputs across the space of all incorrect outputs. The distribution of incorrect outputs for a human is very different than the distribution of incorrect outputs for a LLM. This is trivially observable in daily use!

submitted 1 day ago

comment in response to post

I don't think I am. Humans will often *speak nonsense*, but they do it for radically different reasons than LLMs do (peer pressure, in-grouping, etc.) And this distinction matters! Because LLMs "speak nonsense" differently than humans "speak nonsense"!

submitted 1 day ago

comment in response to post

Simplifying this down to "humans screw up too" is somehow simultaneously deeply unfair to *both* things that are being compared. Both are fallible, but the distribution of failure is *radically* different!

submitted 1 day ago

comment in response to post

And if your thrust is that I should treat Claude the same way I treat a ranting street preacher... that's a real problem for the product! But also, unfair to the LLM! It is useful in ways that the ranting preacher is not!

submitted 1 day ago

comment in response to post

Not usually, no? Again, let's not generalize a problem prematurely. Have I *ever* asked another mobile app developer for an API and received an answer that plainly *did not exist and never did*? No.

submitted 1 day ago

comment in response to post

Which is not to say we can't or won't make adaptations to LLM-based products to address these - but the way to make this happen is to confront these shortcomings directly, rather than what I feel like is a handwave: humans have problems too (yes, and they're largely different than LLM problems)

submitted 1 day ago

comment in response to post

Likewise, we've built entire algorithms around sifting consensus out of noise - PageRank is one of the earliest and still extremely useful. We also present users with lists of references, such that the consensus (or lack thereof) is easily judged.

submitted 1 day ago

comment in response to post

We've built a *lot* of infrastructure around the fallibility of human thought, and to signal to other humans what has consensus and what does not. There are wrong answers on Stackoverflow for example! They're also likely to be downranked or have comments below them.

submitted 1 day ago

comment in response to post

"What is a fact anyway" and "aren't humans also stochastic parrots" IMO are unhelpful dodges because they elide actual problems that are unique to the tech.

submitted 1 day ago

comment in response to post

Woah, let's reset here. I use LLMs daily, I *ship products based on LLMs*. Mine is not a "this is all useless, throw it away" stance. But we need to reckon with the actual limitations of this tech, how it's presented and pitched to users, and what the practical implications are.

submitted 1 day ago

comment in response to post

Like, recently I asked Claude for an API to use if I wanted to do a particular thing on Android. It hallucinated a non-existent API that as far as I can tell never existed in any way, shape, or form. That's not an epistemological problem!

submitted 1 day ago

comment in response to post

I feel like this whole "what is a fact even" impulse is a dodge. Of course there are questions for which even reasonable humans have different notions of fact - but the LLM screws up even questions where there is consensus! That's a real problem!

submitted 1 day ago

comment in response to post

We know that distribution of training data matters - asking for things that are rare in the training set often results in hallucinations - *even if the underlying training data has consensus about what the "fact" is*. No epistemology required - the LLM is just wrong!

submitted 1 day ago

comment in response to post

Heavily disagree. Sure, at the edges "fact" is non-obvious, but we don't have to go to the edges for these LLMs to misbehave.

submitted 1 day ago

comment in response to post

+1. They don't solve *all* the problems with cars, but it's pretty much 100% upside from the status quo.

submitted 1 day ago

comment in response to post

Fair enough! The Grand Challenge model did attract academic participants, but these engagements often rely on existing school funding anyway (the 10% chance of a payoff is icing on the cake and not existential) Which is to say, I have strong doubts about the "lottery" method of funding research...

submitted 2 days ago

comment in response to post

The problem with this approach is that the participants are overwhelmingly academia. "10% chance of big payday for bleeding-edge research, 90% chance of nothing" in an academic setting is tolerable risk, in the private sector it means a company goes under.

submitted 2 days ago

comment in response to post

This happens already though! The genesis of self-driving cars was the DARPA Grand Challenge. No funding upfront for any players, but prizes for reaching milestones. Where the rubber meets the road is when the government steps *away*.

submitted 2 days ago

comment in response to post

This is an area where the millennial-heavy userbase shows itself. Many (like me!) are struggling to buy homes at unprecedented valuations. It's a real problem. But the majority of the population already own, at costs that are maneageable for them. It's an intensely cohort-specific problem.

submitted 3 days ago

comment in response to post

Most tech advancements don't change your life in a noticeable way. Products get better, safer, lighter, smaller, cheaper over years through accumulated advancements. Once in a while you get something *really* big (see: smartphones), but most advancement isn't like this.

submitted 3 days ago

comment in response to post

Ehhhh disagree a lot with this, as someone with a background in both compsci and robotics. The US is (was?) an absolute powerhouse of the academia -> industry pipeline. Just because a lot of it is quiet doesn't mean it's not there.

submitted 3 days ago

comment in response to post

Not just a cost thing but slab can *radically* simply water management. Arguably *many* parts of NJ shouldn't have basements given the ubiquity of wet basements. I can see why even new builds would avoid that added complexity, especially in our geography.

submitted 3 days ago

comment in response to post

It used to be you had to come with a well-formed thesis about your product, who it's for, and what it should do. Now you just YOLO some code and hope the analytics reveal some secret truth that absolves you from thinking and analysis.

submitted 3 days ago

comment in response to post

I feel this intensely about the tech industry. The advent of universal experimentation and deep analytics has removed the need to know *what* product you're building or indeed *why*. The result is an ocean of crappy products that are hoping someone else tells them how they should work.

submitted 3 days ago

comment in response to post

Perhaps less internally repressive, but not more democratic, if that makes any sense?

submitted 5 days ago

comment in response to post

The “buying off” thing I think is real but doesn’t result in *democracy* but rather the opposite. Illiberalism is justified because the state provides. See for example: China, especially when economic growth rates were much higher.

submitted 5 days ago

comment in response to post

Also you can deliberately exploit the airport service for more ridership! Zürich for example built an entire non-airport-related commercial hub around the airport train station.

submitted 6 days ago

comment in response to post

One unifying factor in all of these stories is that the users have *very* long, continuous conversations with the LLM. You aren't likely to replicate this behavior in short conversations of a couple dozen turns.

submitted 6 days ago

comment in response to post

The key here is "long". On a technical level, we know that adherence to the initial system prompt declines as the context (i.e., chatlog) gets longer. At extreme lengths a lot of the safeties (to the extent they are really safe in the first place) fail.

submitted 6 days ago

comment in response to post

And Nintendo's strategy is harder. You need re-earn cultural relevance with many cohorts of young people over several decades, vs. "hey you played Halo when you were 12. Wanna play some Halo now that you're 36?" The fact that Mario has been culturally relevant *within different cohorts* is a coup!

submitted 7 days ago

comment in response to post

This feels like the continuation of an existing trend. YouTube made "making TV" radically cheaper and has vastly altered the content calculus. AI looks like it may be the next step?

submitted 8 days ago

comment in response to post

Isn't this already the case with YouTube? Most people who upload don't get any eyeballs. A minority get a consistent but small viewership, while a tiny slice of mega-accounts get millions of views on everything.

submitted 8 days ago

comment in response to post

Yeah, but this process is *super* inefficient and the companies have never really gotten serious about understanding how to do it well. I'm sympathetic to "we need to try a ton of different things to see what sticks", but how most FAANGs do it is pretty absurd.

submitted 8 days ago

comment in response to post

Weirdo futurists 💁 🦋 any algorithm Is this the singularity?

submitted 10 days ago