at roughly 3.5 era claude was legitimately smarter than comparables, 3.7 seems like it shaved a lot of breadth to make the model maximally flexible about tasks of arbitrary complexity but made it significantly worse than comparable models at tasks that are mostly rote
it’s absolutely not my original! i got it off a podcast, i’m pretty sure @alexcoxfm.bsky.social said it and i’m not sure if it’s their coinage or if they got it from someplace else
ime the wordcel factor is good for the business email automation which i rarely use but also makes it good for doing open-ended/design stuff, so ironically the shit-for-code bots are also the ones that i would expect to figure out some galaxy brained race condition or something
i will say that they are pretty decent at some drudgery work - I wanted a vision model to extract definitions from lecture notes and then create flash cards based on them - using the mathml format for displaying them
I think that’s right about drudgery work. I’ve just become disenchanted with the quality of the output for editing my writing or and the number of errors when I want factual information. Have a hard time trusting it.
and so far wow it has been effective - it’s hard to imagine another solution that would give me as much utility for the cost of a few cents per api call
i find chatgpt generally to have a tone that i just don’t like too much - it’s unfortunate because they’re not usually that reliable in terms of how they solve things
4o is like a more sycophantic Claude 3.7 Sonnet, which stellz's complaint about is fair.
Chatting with the computer is fun to me, but that's an entirely different use case than getting shit done. Gemini 2.5 Pro is my fave of the big cloud models for getting shit done.
it also thinks forever in copilot mode only to be wrong. I think you are kind of a sucker if you pay for Claude code instead of a copilot subscription.
i will say i think the newest gemini is probably the best coder out of the big bois right now, but i find claude to be perfectly cromulent/better than openai.
Comments
Since the o family was derived from 4o initially, I guess it's meant to be like 4o3?!
Chatting with the computer is fun to me, but that's an entirely different use case than getting shit done. Gemini 2.5 Pro is my fave of the big cloud models for getting shit done.
i guess i haven't given o4-mini a fair shake
What are you using?