Claude actually kinda sucks I think like basically every other one is better - ThreadSky

piss.beauty • 26 days ago

Claude actually kinda sucks I think like basically every other one is better

Comments

at roughly 3.5 era claude was legitimately smarter than comparables, 3.7 seems like it shaved a lot of breadth to make the model maximally flexible about tasks of arbitrary complexity but made it significantly worse than comparable models at tasks that are mostly rote

segyges.bsky.social•26 days ago

if i wanted smooth writing i would do 3.7 or R1 right now. if i want code written i will use a different deepseek or chatty gee

serenity.tgirl.gay•26 days ago

chatty gee is my new favourite way of referring to Mr. Gippity

segyges.bsky.social•26 days ago

it's a @lathrys.at original and it makes me think of Ali G every time which has upsides and downsides

lathrys.at•26 days ago

it’s absolutely not my original! i got it off a podcast, i’m pretty sure @alexcoxfm.bsky.social said it and i’m not sure if it’s their coinage or if they got it from someplace else

serenity.tgirl.gay•26 days ago

still a banger

piss.beauty•26 days ago

I drive a mix of o3/o4 mini and Gemini right now and it works so much better than 3.7 it's not even funny

futur.blue•26 days ago

I still find it to be a better wordcel on average than gemini 2.5 and o3 though

segyges.bsky.social•26 days ago

it's a top tier wordcell, 3.7 and R1 win my top wordcell marks

piss.beauty•26 days ago

yeah but writing stuff is the idiot use case of these things

segyges.bsky.social•26 days ago

ime the wordcel factor is good for the business email automation which i rarely use but also makes it good for doing open-ended/design stuff, so ironically the shit-for-code bots are also the ones that i would expect to figure out some galaxy brained race condition or something

segyges.bsky.social•26 days ago

we have junior swe bot and extremely smart pm/architect bot and they're not the same bot, curiously

futur.blue•26 days ago

yeah but sometimes you just want a guy yknow

jim-greco.com•26 days ago

The more I use these chat bots the less impressed I am. Using them so much less.

aidenfoxivey.com•26 days ago

i will say that they are pretty decent at some drudgery work - I wanted a vision model to extract definitions from lecture notes and then create flash cards based on them - using the mathml format for displaying them

jim-greco.com•26 days ago

I think that’s right about drudgery work. I’ve just become disenchanted with the quality of the output for editing my writing or and the number of errors when I want factual information. Have a hard time trusting it.

aidenfoxivey.com•26 days ago

sometimes it seems to be a decent search engine too? idk i’m too young to remember good search engines

aidenfoxivey.com•26 days ago

and so far wow it has been effective - it’s hard to imagine another solution that would give me as much utility for the cost of a few cents per api call

aidenfoxivey.com•26 days ago

of course it does require i go and verify each new definition it adds, but it hasn’t been too bad compared to writing all the flashcards by hand

aidenfoxivey.com•26 days ago

i find chatgpt generally to have a tone that i just don’t like too much - it’s unfortunate because they’re not usually that reliable in terms of how they solve things

piss.beauty•26 days ago

all of them have this problem unfortunately

aidenfoxivey.com•26 days ago

:( need to go back to my trusty markov chain

ens0.me•26 days ago

As far as ChatGPT goes, o3 >>>>>> 4o

aidenfoxivey.com•26 days ago

oh yeah that’s something i don’t get at all- why does the o migrate to the right side of the number?

ens0.me•26 days ago

Their branding just sucks ass, that's why, lmao

Since the o family was derived from 4o initially, I guess it's meant to be like 4o3?!

ens0.me•26 days ago

4o is like a more sycophantic Claude 3.7 Sonnet, which stellz's complaint about is fair.

Chatting with the computer is fun to me, but that's an entirely different use case than getting shit done. Gemini 2.5 Pro is my fave of the big cloud models for getting shit done.

halzaldivar.bsky.social•26 days ago

Yeah, not my first choice for Pope, but hey, times change

alice.mosphere.at•26 days ago

many are saying this

alexwilson.bsky.social•26 days ago

alice.mosphere.at•26 days ago

gemini 2.5 pro + o3 + o4-mini(-high) is the way

piss.beauty•26 days ago

nah high is stupid

piss.beauty•26 days ago

o4 mini like barely works in comparison to o3 in my experience

alice.mosphere.at•26 days ago

oh yeah but o3 is pricy af

piss.beauty•26 days ago

no I mean the mini version. o4 legit is worse

piss.beauty•26 days ago

Claude is for people who want to talk to the computer, which I don't. I merely want it to perform a limited semantic infill task and shut up

aly.ruffruff.party•26 days ago

I want a man page and API doc lackey basically

slime.bsky.social•26 days ago

thats great claude but can we wrap it up i got shit to do

piss.beauty•26 days ago

it also thinks forever in copilot mode only to be wrong. I think you are kind of a sucker if you pay for Claude code instead of a copilot subscription.

segyges.bsky.social•26 days ago

3.7 at least is legitimately terrible as a codebot for copilot stuff

aparker.io•26 days ago

are you using it to do rust because ime claude is really bad at rust

piss.beauty•26 days ago

no, because it's not usable. the other models got a bit better at that.

aparker.io•26 days ago

i will say i think the newest gemini is probably the best coder out of the big bois right now, but i find claude to be perfectly cromulent/better than openai.

i guess i haven't given o4-mini a fair shake

segyges.bsky.social•26 days ago

i should probably actually look at gemini, i have had them written off for a long time

piss.beauty•26 days ago

it's not as good as o3-mini actually