This is excellent - crammed with practical advice about how to build useful systems that use LLMs to run tools in a loop to achieve a goal. Wrote some short notes here: simonwillison.net/2025/Jan/11/... - ThreadSky

simonwillison.net • 160 days ago

This is excellent - crammed with practical advice about how to build useful systems that use LLMs to run tools in a loop to achieve a goal. Wrote some short notes here: https://simonwillison.net/2025/Jan/11/agents/

Comments

noisyfrequency.bsky.social•159 days ago

I need to read this again more deeply. But what do you make of this? I can't understand the value proposition. The explanations are nice but the setup and validation steps seem immense, time consuming and tightly coupled to the systems they work in. While the correct output is not even guaranteed.

k17.dev•160 days ago

This is interesting, but there are some really bad base assumptions about the 'reasoning' AI is doing that fundamentally misunderstands the technology. It can't reason, it's giving you the average of the data it's been given.

k17.dev•160 days ago

Not that there isn't a use case for that, but when it comes to things like forecasting it can be extremely dangerous. Assuming today will be like yesterday is how you get wiped out by black swans.

k17.dev•160 days ago

These tools are not accurate, but that isn't a problem when you have _inputs_ (say parsing whether images are galaxies or cells are cancerous). You account for the accuracy and have a human interpretation based on that.

k17.dev•160 days ago

This article is proposing _outputs_, which means ANY mistakes by the AI will be high cost, with no chance for a human to correct.
Best I've seen, with max cloud compute costs, is 90%ish accurate.

simonwillison.net•160 days ago

Whether or not LLMs can "reason" very much depends on which definition of "reasoning" you are using

I'm confident that they can perform an imitation of "reasoning" that's good enough for things like executing a plan to run some tools with a high enough success rate to be useful

k17.dev•160 days ago

They are predictive language models, they cannot reason. That's why every single one breaks when you give it trick questions.
In this paper they clearly identify the success rate needed and it's way past what any current model is capable of.

k17.dev•160 days ago

Even a 1% error rate compounds to gibberish really fast. Meanwhile AI is slowing down, each iteration is less of a step above the previous. Realistically, we have to assume 90-95% is the best accuracy we're ever going to get.
And that rules out agents.

simonwillison.net•160 days ago

As always though, if you want to build software that is 100% reliable LLMs are the wrong platform!

k17.dev•160 days ago

Yes, but then you don't have "agents". The gist is fine when you're translating a menu for you to eat, terrible if you're the restaurant and want to reach customers speaking another language.

ruoshuiresearch.bsky.social•159 days ago

think planning is really just the intersection of memory and interpretability

the longer the context window, the longer the plan can be due to context memory

the stronger the interpretability, the better the plan due to proper causal inference

webology.bsky.social•160 days ago

Thank you for sharing this. It's going to take me a bit to get through it, but skimming the plan bits aligns nicely with a few things I have thought a lot about, but was slightly overwhelmed to commit it to code.

webology.bsky.social•160 days ago

With GPT4(ish) level LLMs being locally accessible, I wonder what embedding an Agent in some of my Django projects might look like. It is kind of like giving them some tools and data and then maybe small jobs they can do.

tjanicek.bsky.social•160 days ago

📌

Comments

Posting Rules

Reply