Tech companies and engineering teams rolling out customer-facing AI agents will very quickly realize a massive problem with them:
They are non-deterministic.
The classic engineering approach of build-->test-->run automated tests to ensure there are no regressions -- this will NOT work here!
They are non-deterministic.
The classic engineering approach of build-->test-->run automated tests to ensure there are no regressions -- this will NOT work here!
Comments
abt 6 months ago, we bought a new washer dryer; consumers reports gave hi marks to many LG models, so I went with one of those
there is no manual
I repeat: there is no manual for the washer
unreal
Maybe something about how engineers "observe" how things run.
We could call it something like "observability" maybe.
When you deploy 100s of times a day on 100s of components, you need to observe your system holistically (and effectively) and shorten feedback loops to truly be able to control it.
It was about understanding what is "actually" going on, being able to ask *new* questions about what went on in the past.
Do you have any insight in how people have been tackling this? Or have they just not?
I wonder how will these agents be versioned, as I believe they need to. Will temperature and token window be versioned?
How often is it ok to not record a sale, or cancel your insurance?
The software industry is unprepared for indeterminism.
I wrote about this a couple of years ago: https://www.aylett.co.uk/thoughts/llm_repeatability
"Would you trust it to approve customer refund requests?"
It's going to be hilarious when companies replace customer service with AI agents, and people figure out how to trick those agents into giving them free stuff.