ai-notes.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

I think for most tasks, the bottleneck is reliability, not capability. So even though capability is definitely increasing on some dimensions (for whatever reason, scaling or otherwise, I don't know) most people just don't notice. Very, very few people need the math abilities of o1-preview.

submitted 115 days ago

comment in response to post

To put it another way: some folks in the NLP community would be horrified if they knew what people actually use search engines for!

submitted 127 days ago

comment in response to post

It's a funny analogy, but I think the situation might be subtler than this. People use search engines for all sorts of things, not just information retrieval. For some of these other tasks, isn't it conceivable that AI would be more fit for purpose?

submitted 127 days ago

comment in response to post

People in science and technology are seeing something very different from people in the humanities, but I think that's a temporary phase.

submitted 127 days ago

comment in response to post

Isn't this just a matter of different subdisciplines using the word "model" in different ways? I feel like I'm watching a mathematician complaining that fields aren't just a bunch of grass, they have to be commutative.

submitted 134 days ago

comment in response to post

Real-world usage spans a very broad set of tasks. Look at the data yourself if you don't believe me, e.g.: www.nber.org/papers/w32966 And true generality is definitely an engineering goal—it's the famous G in "AGI." All frontier model companies are public and explicit about this.

submitted 136 days ago

comment in response to post

I don't know of any technology adopted as fast as ChatGPT. Examples that are close (personal computers, the internet) indeed became pervasive and foundational. E.g. see www.stlouisfed.org/on-the-econo...

submitted 136 days ago

comment in response to post

I've met a lot of people who are 100% certain that AI will flop. That's probably who this kind of language is aimed at. I completely agree it would be better if they hedged and said, "There's a decent chance AI will be pervasive, and we want you to help decide how we use it."

submitted 136 days ago

comment in response to post

LLM-based chatbots are built for general use and in practice are used for a wide variety of things. I'm genuinely curious: what leads you to see them as application-specific artifacts? Or is this more of a normative statement, that you wish they'd be built and used in a more targeted way?

submitted 136 days ago

comment in response to post

I think it sets a baseline, but not a ceiling. And LLMs have blown way past my baseline expectations for what I guessed next-token prediction would produce. Isn't it at least a reasonable hypothesis they may be learning something deep as a byproduct of a superficial training task?

submitted 136 days ago

comment in response to post

LLMs are a technique, not a tool: they're not "meant" for anything. (Is the fast Fourier transform "meant" for audio engineering or detecting nuclear tests? Why not both?) And at this point, the best LLM-based systems are far better than the average person at math. Surely that's worth exploring?

submitted 136 days ago

comment in response to post

Oh, I see what you're saying! That is interesting, and I don't know of any studies.

submitted 143 days ago

comment in response to post

The belief was that this made it easier to learn to translate the first word, which then made it easier to learn to translate the second, etc. I don't know if they ran careful experiments to show this was the mechanism.

submitted 144 days ago

comment in response to post

I think there might be more to the story. One of the biggest AI believers I know is (1) a socially adept extrovert; and (2) was incredibly skeptical, up until LLMs became good enough that they helped him write a certain type of specialized code much faster.

submitted 145 days ago

comment in response to post

I believe you. There seem to be dramatic differences between subdisciplines. In your work it's useless, but in chemistry, it just won a Nobel. As we figure out what universities should do, I find it helpful to take into account how different our various experiences are.

submitted 145 days ago

comment in response to post

I think her analysis of the structural pressures on universities is excellent! But what I'm seeing on the ground is a mix of those pressures with "endogenous" aspects of the technology itself: its enormous utility for certain kinds of work, and its rapid improvement. Those are critical factors, too.

submitted 145 days ago

comment in response to post

Excellent mini-talk! One missing variable is that many profs (in physics, chemistry, CS) are now finding AI extremely useful for their own work. That makes it harder to see as a "cheating device." This seems like a huge factor in the "pivot," and which may not be equally visible in all disciplines.

submitted 145 days ago

comment in response to post

So is it fair to say your level of belief (or disbelief) would be the same if they'd used the p < 0.05 standard?

submitted 148 days ago

comment in response to post

I suppose the converse question is interesting too: what grand-but-incorrect discoveries would we have made without an understanding of null hypothesis testing?

submitted 148 days ago

comment in response to post

Great essay! You ask, "What are the grand discoveries that we wouldn’t have made without an understanding of null hypothesis testing?" Would the discovery of the Higgs boson count? As I understand it, the transition from "cool theory" to "Nobel prize" hinged on a p-value.

submitted 148 days ago

comment in response to post

Yep! The argument in your paper makes sense. It was just the nonstandard use of "structural stability" that threw me. (In standard usage, e.g., the identity map on a manifold is *not* structurally stable.) Anyway, it's a great article, whatever the terminology you use!

submitted 149 days ago

comment in response to post

Very likely nothing will change for one inference pass, by continuity. But it's entirely possible that after many more next-token inferences you'll see a large enough to change to affect what output token is produced. (This is much like roundoff error accumulating).

submitted 149 days ago

comment in response to post

I should say that by "behavior" I mean the result of just one inference pass, as opposed to long-term dynamics.

submitted 149 days ago

comment in response to post

You're making a simpler and stronger point, I believe: behavior changes *discontinuously* with parameters, a major departure from most neural nets. Traditional "structural stability" is more subtle, and my guess is it would probably be hard to show any real-world transformer is structurally stable.

submitted 149 days ago

comment in response to post

Thanks for this very useful survey! A question: what exactly is your definition of "structural stability"? Usually the term applies to dynamical systems, but how exactly is a transformer a dynamical system? (It actually looks to me like you might be talking about "continuity" instead?)

submitted 149 days ago

comment in response to post

They very much do believe AGI is achievable, and in the (relatively) near future. There are entire social circles in San Francisco that take this for granted. Keep in mind, though, that "intelligence" means something narrow for this crowd, namely pure cognitive capability.

submitted 151 days ago

comment in response to post

There is definitely a point where it breaks down. But I've used it for routine code tasks for about a year, and it's been extremely reliable. Saved me a lot of tedium!

submitted 152 days ago

comment in response to post

Asking an LLM to summarize data is a terrible idea. But ChatGPT is great at writing code for mundane data transformations.

submitted 152 days ago

comment in response to post

Transformer architecture in the description is the "laws of physics" for an LLM. But that's not what makes LLMs work—random transformers do nothing. The power comes from a very specific combination of billions of parameters, which (like the brain) have a rich, intricate structure.

submitted 153 days ago

comment in response to post

Are you seeing this from a dualist position (there's something outside of the laws of physics in the brain)?

submitted 153 days ago

comment in response to post

Couldn't you write an equally low-level description of the brain, full of chemical formulas and equations?

submitted 153 days ago

comment in response to post

That definitely doesn't sound like a win for LLMs. Seems like a classic example of a purpose-built system being a better choice when reliability is critical!

submitted 155 days ago

comment in response to post

How can we be sure the process is just inductive, though? It seems conceivable that some of these systems may do some sort of reasoning. I don't think we can say much with any certainty about the high-level mechanisms inside these models (especially with closed-source frontier systems).

submitted 155 days ago

comment in response to post

Extremely interesting data point! Do they pay your company the same amount as before? Or is it possible there's still some net savings?

submitted 155 days ago

comment in response to post

Gemini 1.5 Pro, however, fumbles the ball! Maybe the takeaway is that questions like this could be good for differentiating between specific chatbots, but don't tell us anything intrinsic about how LLMs work in general.

submitted 156 days ago

comment in response to post

I tried this on Claude, and it too produced a correct, well-explained answer.

submitted 156 days ago

comment in response to post

What version of ChatGPT did you test? I just tried your exact prompt with 4o, and got what looks like a perfect (and well-explained) result. (or am I misreading?)

submitted 156 days ago

comment in response to post

That's an excellent point! "Eagerness to talk about it" and "enthusiastic user" are definitely not the same.

submitted 158 days ago

comment in response to post

My unpopular opinion is that the limpid geometry of linear transformations is far nobler than the bureaucratic murk of formal grammar!

submitted 160 days ago