Profile avatar
eointravers.bsky.social
I'm a Data Scientist, working on responsible AI for mental health. Posting about data, AI, evals, and cognitive science. eointravers.com 🇮🇪
90 posts 118 followers 503 following
Regular Contributor
Active Commenter
comment in response to post
I think deep linear networks are an example of this, where you have a deep model with just the capacity of regular linear regression.
comment in response to post
Bullshitting Engines?
comment in response to post
To be fair, if you're doing analytic philosophy, "bullshit engine" reads as an engine that is bullshit, not an engine that engages in the communicative act of bullshitting, because engines don't communicate or have concerns, they set things in motion.
comment in response to post
Sorry, broke the second link: www.cell.com/neuron/fullt...
comment in response to post
Yup: pmc.ncbi.nlm.nih.gov/articles/PMC... (commentary: www.cell.com/neuron/fullt.... Lots of later work on the same idea since then as well.
comment in response to post
It's in the opposite direction though, once your participants know how you want them to behave, they're pretty likely to behave that way. en.m.wikipedia.org/wiki/Demand_...
comment in response to post
I think it's a choose-your-own-null adventure kind of thing, like you're worried it might be. I can see the (bad) argument, if you allow lots of parameters to vary either side of the dotted line, any differences prove that the line is important?
comment in response to post
Um. statmodeling.stat.columbia.edu/2021/11/21/s...
comment in response to post
If you think different slopes are bad, you should see www.tandfonline.com/doi/abs/10.1... (and related posts: statmodeling.stat.columbia.edu?s=regression...). Polynomial madness.
comment in response to post
For one-in-three, it’s ~70.4%. One-in-four, ~68.4%. As n increases, the answer gets closer and closer to 1−1/e ≈ 63.2%, where e is Euler’s number. en.wikipedia.org/wiki/E_(math... Why 1-1/e​? Honestly, you would have to ask someone better at maths than me, but I think it’s a pretty cool result.
comment in response to post
So the prob. that it does happen at least once is the probability that it *doesn’t not happen*, 1 - (1 - 1/n)^n For a one-in-two chance, this works out as 1 - (1 - 1/2)^2 = 1 - 1/4 = 75%
comment in response to post
The prob. of trying twice and it not happening is the prob. of it not happening the first time, times the prob. of it not happening the second time: (1 - 1/n) * (1 - 1/n), or (1 - 1/n)^2 The prob. of it not happening in n attempts is (1 - 1/n)^n
comment in response to post
If you take a one-in-n chance, the probability of it coming off is 1/n​. If you roll a six-side die, the probability of rolling a 6 is 1/6​. The prob. of the event not occurring is one minus the probability that it does occur: 1 - 1/n
comment in response to post
But first, @xkcd.com xkcd.com/882/
comment in response to post
Poker is probably an interesting case study here, because AFAIK expert poker players don't try to solve K-level theory of mind problems, they just have really good heuristics.
comment in response to post
In principle, that might mean we can get LLMs to reason under uncertainty pretty well if we fine-tune on the right heuristics?
comment in response to post
There's an old idea in psychology (e.g. core.ac.uk/download/pdf...) that when people do perform well under uncertainty, it's because they're pattern matching using the right heuristics, rather than doing Bayesian inference.
comment in response to post
To be fair, humans are famously also pretty bad at it, so this one might be a draw.
comment in response to post
Yes, but also: xkcd.com/927/
comment in response to post
and, as an ilustration, uses an LLM to quickly check the job spec against my requirements. A browser extension is a more mature way of achieving the same thing, but would take me considerably more time to get up and running.
comment in response to post
Just realised you're doing this in the Buttondown UI, so you have no need for a nice python abstraction, but I had my fun anyway.
comment in response to post
This is nice. I got sidetracked by this, and accidentally spent 10 minutes ALMOST figuring out how to make this compatible with `+` and `|` operators. Make of that what you will.
comment in response to post
YES! I've been planning to write more about the importance of this stuff (largely, to be fair, for selfish reasons, since I'm a psycho methods-to-AI person, and I'm on the job market). eointravers.com/blog/convo-e...
comment in response to post
[...] would benefit from learning transferable experimental psychology and AI skills doing this as dissertation or even group projects.
comment in response to post
Thanks. Those links are...interesting. 🤔
comment in response to post
TLDR: conversations are graphs, and each node contains a) a prompt guiding what the chatbot should say, and b) possible classifications for the user's resposne, dictating which node we go to next. That's most of what you need!
comment in response to post
Come on.
comment in response to post
Ah.
comment in response to post
Also, a different domain, but this work on using LLMs as proxy human participants for cognitive modelling might be of interest: arxiv.org/abs/2502.00879
comment in response to post
ref 4 (Argyle) seems to be suggesting generalising from LLMs to humans. Sorry for the rabbit hole, my question is just: Who is openly saying we should study LLMs in place of humans, so I can avoid them? Thanks!
comment in response to post
I'm sure there are crackpots on the internet saying this, but is this something actual social scientists say? 😟 A lot of what I've seen on this, including some of your early refs, essentially say "you can do this, but obviously you should only do it for pilotting", although [...]
comment in response to post
This looks like great work! 🚀 Could I ask about the "LLMs are replacing human participants" bit? Are there serious people out there claiming that we can analyse the outputs of LLMs role-playing as members of minority groups, and generalise from these to actual behaviour of ppl from those groups?
comment in response to post
@hamel.bsky.social whoops, didn't see that one, thanks. I'll add a link to it. I don't disagree with your advice to focus on binary decisions initially, but I do think there's a lot of value in making stakeholders agree early on on provisional criteria for what would make a "good" interaction.
comment in response to post
From that point of view, I don't see any reason LLMs can't be "creative". Before the pitchforks arrive, I need to be careful: I'm not saying that they can create art, or anything like that, but they can produce text that triggers novel, useful ideas in the mind of the reader.
comment in response to post
There's an old, robust idea that creativity is about combining and reassociating existing ideas in novel ways, rather than somehow creating things "from scratch" (whatever that means). e.g. www.themarginalian.org/2013/05/20/a..., or www.semanticscholar.org/reader/927c1...