I was curious if a newer model would do any better, and it did.
I only replicated the first part of your experiment, and I didn't try to reproduce my own results. I verified locally with standard sort and uniq.
I found ChatGPT failed to sort alphabetically (e.g. Winter Jasmine, Wood Anemone, Wisteria).
My conclusions from one experiment:
1. LLM was useful in generating flower names as long as I didn't care whether the flower names were real.
2. All steps required verification.
But when would a person ever want a list of made-up flower names? And since someone might, why wouldn’t that be a special instruction, with real names as the default? And if human verification is necessary at every step (which it definitely is)…how is this saving any time or human energy?
It certainly didn't save me any time or energy -- I'm very impressed by what LLMs can do (I think Simon Willison does the best practical exploration), but they are fundamentally bullshit machines. It's disappointing relative to what I want AI to be.
Interestingly, for the original task I based this on, it wouldn’t have mattered a bit if the names were real or not. (I needed to come up with a shitload of variable names so I generated lists of adjectives, colors, plants, and animals, and combined them into pairs like cuddly-rose and red-octopus)
Which, tbh, is why I used ChatGPT for it. That would have been an irritating task to do on my own, and all I needed it to make was words without regards to semantics or truth.
In a parallel universe somewhere, the “AI” is accessing a series of structured APIs to 1) get a list of flowers and their colors from a database, and then 2) randomize and sort the tabular data using ironclad static code that was written for that purpose half a century ago.
The AI middleman could still make mistakes, but it would be a thousand times easier to QC the process and identify points of failure when the pipeline is comprised of discrete, known, purpose-built services instead of a giant black-box statistical engine! Why are we doing this?!
Because the black box allows people to make more vague promises so they can sell it to people and there are some true believers who believe the black box will eventually become Techno-Yahweh.
Comments
I only replicated the first part of your experiment, and I didn't try to reproduce my own results. I verified locally with standard sort and uniq.
1/2
My conclusions from one experiment:
1. LLM was useful in generating flower names as long as I didn't care whether the flower names were real.
2. All steps required verification.
https://chatgpt.com/share/680f9cdd-1aa0-800d-8cd7-11652d2ea31a