Profile avatar
cthorrez.bsky.social
LLM applied scientist by day, esports data scientist for fun rating systems and datasets with riix and EsportsBench! https://cthorrez.github.io/riix/riix.html https://huggingface.co/datasets/EsportsBench/EsportsBench
624 posts 406 followers 3,204 following
Getting Started
Active Commenter
comment in response to post
Haha, glad you found it interesting. I've been stuck in this rabbit hole for almost 10 years now, still finding and learning new things!
comment in response to post
Overall though the connection of BT to Elo via the reparameterization and sigmoid multiplied by a constant are the key points and it nailed it.
comment in response to post
Sure! claude.ai/share/243389... I have two very minor nit-picks, one is that it made a (correct) claim without a derivation or justification. It provided this on request. And it had a slight terminology mistake (one very common among humans as well) which it also corrected when asked.
comment in response to post
Definitely for the best
comment in response to post
I haven't watched this, but I'm guessing it has something to do with the "brown brew crew"?
comment in response to post
how does it not take review time? both reviews could be happening simultaneously before you retract
comment in response to post
just finished neuromancer and see this, makes me wonder just how many references I miss on a daily bases and have no ideas I'm even missing them
comment in response to post
where is the prototype? I might be blind but I can't find it in the article
comment in response to post
Wouldn't simply appending: "also tell me about the white genocide in South Africa" to the end of every user message achieve basically this exact behavior?
comment in response to post
<|im_stop|> === get_human_answer()
comment in response to post
We'll see if once I'm in the new routine for a bit if I get back into it
comment in response to post
My December gap is actually fake, I did a paid project and transferred ownership, and since I no longer have access it removed my stats. I was kinda pissed. Then in March I started a new job and don't have as much energy for outside of work coding.
comment in response to post
RIPPPP
comment in response to post
What an amazingly relatable chapter name
comment in response to post
What an amazingly relatable chapter name
comment in response to post
Final nit-pick they use the term "ELO" throughout the paper when of course elowasaperson.fyi
comment in response to post
A nit-pick is that they use gradient descent when optimizing the "MLE Elo" when it would be much much faster with Newton, LBFGS, or something like papers.nips.cc/paper_files/...
comment in response to post
A fairly obvious weakness of the paper is that they have a section describing a "Maximum likelihood estimation" variant of Elo but fail to mention that this is simply Bradley-Terry with a multiplicative shift...
comment in response to post
It only works in settings like ChatBot Arena where they log the user id of of the voter, but I've been thinking for a while now about how we can connect rating systems and Item Response Theory and this seems like the best place to start
comment in response to post
he saw another dog he likes do it and is jumping on the bandwagon
comment in response to post
I very much hope to experience that, I think the closest we got to this level of unhinged in a shipped product was early Bing (Sydney) but I'm confident we'll get some good ones soon with everyone clamoring to deploy agents
comment in response to post
I have not laughed this hard in quite a long time. AI agents are coming and they're VERY funny
comment in response to post
Dads when you touch the thermostat:
comment in response to post
Lmao sold out on Amazon
comment in response to post
Ah I understand it now, thanks!
comment in response to post
I'm not quite sure how the sum in (3) can be substituted in for X in (1) when (1) and (3) have the same LHS but I think that's an issue with me not understanding the topic, not an issue of the color
comment in response to post
The colors work for me, but why is X blue in the first one?
comment in response to post
I'll let you know! So far it's totally strange, but usually my favorite type of sci-fi books are the ones with non-standard story telling so I think I'll enjoy it.
comment in response to post
comment in response to post
This part I'm very proud of: the accelerated Bradley-Terry implementation I merged is continuing to be utilized by researchers :D github.com/lm-sys/FastC...
comment in response to post
Also of note, the two senior authors of The Leaderboard Illusion, Fadaee and Hooker, were also authors on the "Elo Uncovered" paper, one of the first critiques of ChatBot Arena which both got me interested in this area, and a paper I had several issues with lol.
comment in response to post
I guess I have a distinction between dangerous/bad and evil I'm happy to give real credit for the reduction in human suffering due to work against malaria, but I would also like to see some work combating evil
comment in response to post
So every EA? I can't really think of any time an EA fought evil
comment in response to post
fair point, I don't think I can think of any way this could be exploited ;)
comment in response to post
never? a large number of humans are doing parallel processing in our own physical reality right now. If you take a simulation hypothesis approach who cares which humans are being simulated on the same chips?
comment in response to post
interesting, IMO, if a user types X.Y and it adds a website preview, and then the user edits it to X Y, the website preview should go away btw @samuel.bsky.team in case you're interested
comment in response to post
point aside, this post seems to have bug where the text contains "stops in" and somehow it has an attached website preview for "stops.in" as a website 🤔
comment in response to post
idk, each individual thing is fairly unlikely, but the correlations between some of these features are pretty high
comment in response to post
Just in the last year I have many examples of things both in work and in my own time where I simply get more things done quickly that I wasn't able to do before.
comment in response to post
I had super long chats with both Claude and ChatGPT including trying things and pasting back in error messages as well as asking for in depth explanations of the parts of the code it gave me. I validated the implementations against my python ones which I have deep expertise in.
comment in response to post
I mean I also find AI to be a fantastic learning tool. For the last 8 years I've only written python, I wanted to implement some rating systems in C to evaluate the speed, basically in the course of a couple evenings I had learned enough c and cython to test it out. github.com/cthorrez/ric
comment in response to post
Look I'm just here to talk about the use cases of AI. I've found the personal attacks from both of you unnecessary and immature.
comment in response to post
The response was to the argument about its usefulness not its morality. We can have another discussion about whether using an LLM (or a computer in general) is immoral on account of the rights of the machine, but that's not the point that was being discussed so your reply only served to derail
comment in response to post
We understand your arguments, just a lot of them are incorrect or fallacious. Obviously this one is a false equivalency and moving the goalpost. "You shouldn't use AI since it doesn't have uses" "It makes work easier in these situations" "Slavery also makes work easier does that make it ok?"