Profile avatar
aaronschein.bsky.social
Assistant Professor of Statistics & Data Science at UChicago Topics: data-intensive social science, Bayesian statistics, causal inference, probabilistic ML Proud “golden retriever” 🦮
63 posts 1,426 followers 370 following
Regular Contributor
Active Commenter
comment in response to post
Also might this be the first recorded instance of "overlapping communities" in social science? A good question for @azjacobs.bsky.social.
comment in response to post
That’s not true in my experience (I am a researcher in the area)
comment in response to post
Another thing all these tech leaders share is a strong financial incentive to publicly endorse such a belief, regardless of whether their private information supports it.
comment in response to post
Ah true!
comment in response to post
I don’t think the US has formally declared war since WW2. The executive has extremely loose military power, regardless of Congress.
comment in response to post
Whoa! What language? And were the dtypes of n and y different?
comment in response to post
Cool! Did their use of “object-oriented” refer to the software or to the math? (Perhaps it is hard to disentangle those in this case…)
comment in response to post
I really like the phrase “object-oriented statistics”, which I think @stat110.bsky.social may have coined. Similar to that is “modular statistics” which Matthew Stephens likes to say.
comment in response to post
There must be a joke here involving tails, but I seem to be memoryless at the moment and unable to supply one
comment in response to post
Not keto friendly
comment in response to post
🙋🏼‍♂️
comment in response to post
Correct
comment in response to post
Seems like the biggest departure from assumptions is that there is no cost to setting up an account on both networks
comment in response to post
Well it’s still nice, I’m not complaining
comment in response to post
This was big of you, thanks for putting this together for us
comment in response to post
Great!! Looking forward!
comment in response to post
Oh wow!! Thank you Gem @ Robin: I will happily sign, but would also be very interested in getting more involved w/ organizing this!
comment in response to post
This is actual golden retriever behavior
comment in response to post
We also take credit for America btw, so you’re welcome for that
comment in response to post
comment in response to post
It’s true that when people say “VI” without further specifying any divergence, it is assumed they mean “VI with reverse KL”. But “VI” still refers to the superset; there’s just a default value for the selected divergence. What other name would you give the superset?
comment in response to post
But nobody says that… VI refers specifically to posterior approximation.
comment in response to post
Again, here I lay it out, and I agree with your statement/derivation of MLE, but I don't redefine p and q: bsky.app/profile/aaro...
comment in response to post
Yes, that derivation is correct and standard. The problem with it is not its correctness, but rather that it redefines what the symbols p and q refer to. There is no connection between this KL(p || q) and the KL(q || p) in variational inference (except that both involve the KL divergence).
comment in response to post
This is what I was responding to. There is no sense in which MLE can be obtained by "swapping the KL" from VI (unless you also totally redefine the symbols p and q) bsky.app/profile/nolo...
comment in response to post
Actually it confuses two totally different meanings of both the "q" and "p" distributions!
comment in response to post
The previous discussion confuses two totally different meanings of the "q" distribution.
comment in response to post
Yes, in MLE there are no latent variables (or, equivalently, they are marginalized out): bsky.app/profile/aaro...
comment in response to post
Notice there are two difference distributions here with "fittable"/"learnable" parameters: p(x, z) and q(z) Often we use theta and phi to denote those parameters---i.e., p(x, z; theta) and q(z; phi). MLE is then more formally: min_theta KL(Pr(x) || p(x; theta))
comment in response to post
- Variational inference with "reverse KL" corresponds to minimizing KL(q(z) || p(z | x)) - Variational inference with "forward KL" (which is what "swapping the KL" usually means) corresponds to minimizing KL(p(z | z) || q(z))
comment in response to post
There are *three* distributions in this conversation which are distinct: - Pr(x): the empirical dist of the data - p(x, z): the assumed generative model, involving latent z - q(z): the variational approximation of p(z | x) MLE corresponds to minimizing KL(Pr(x) | p(x)) where z is marginalized out