aaronschein.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

Also might this be the first recorded instance of "overlapping communities" in social science? A good question for @azjacobs.bsky.social.

submitted 1 day ago

comment in response to post

That’s not true in my experience (I am a researcher in the area)

submitted 120 days ago

comment in response to post

Another thing all these tech leaders share is a strong financial incentive to publicly endorse such a belief, regardless of whether their private information supports it.

submitted 120 days ago

comment in response to post

Ah true!

submitted 140 days ago

comment in response to post

I don’t think the US has formally declared war since WW2. The executive has extremely loose military power, regardless of Congress.

submitted 140 days ago

comment in response to post

Whoa! What language? And were the dtypes of n and y different?

submitted 158 days ago

comment in response to post

Cool! Did their use of “object-oriented” refer to the software or to the math? (Perhaps it is hard to disentangle those in this case…)

submitted 185 days ago

comment in response to post

I really like the phrase “object-oriented statistics”, which I think @stat110.bsky.social may have coined. Similar to that is “modular statistics” which Matthew Stephens likes to say.

submitted 185 days ago

comment in response to post

There must be a joke here involving tails, but I seem to be memoryless at the moment and unable to supply one

submitted 189 days ago

comment in response to post

Not keto friendly

submitted 190 days ago

comment in response to post

🙋🏼‍♂️

submitted 191 days ago

comment in response to post

Correct

submitted 193 days ago

comment in response to post

Seems like the biggest departure from assumptions is that there is no cost to setting up an account on both networks

submitted 201 days ago

comment in response to post

Well it’s still nice, I’m not complaining

submitted 201 days ago

comment in response to post

This was big of you, thanks for putting this together for us

submitted 202 days ago

comment in response to post

Great!! Looking forward!

submitted 203 days ago

comment in response to post

Oh wow!! Thank you Gem @ Robin: I will happily sign, but would also be very interested in getting more involved w/ organizing this!

submitted 204 days ago

comment in response to post

This is actual golden retriever behavior

submitted 204 days ago

comment in response to post

We also take credit for America btw, so you’re welcome for that

submitted 204 days ago

comment in response to post

submitted 204 days ago

comment in response to post

It’s true that when people say “VI” without further specifying any divergence, it is assumed they mean “VI with reverse KL”. But “VI” still refers to the superset; there’s just a default value for the selected divergence. What other name would you give the superset?

submitted 204 days ago

comment in response to post

But nobody says that… VI refers specifically to posterior approximation.

submitted 204 days ago

comment in response to post

Again, here I lay it out, and I agree with your statement/derivation of MLE, but I don't redefine p and q: bsky.app/profile/aaro...

submitted 205 days ago

comment in response to post

Yes, that derivation is correct and standard. The problem with it is not its correctness, but rather that it redefines what the symbols p and q refer to. There is no connection between this KL(p || q) and the KL(q || p) in variational inference (except that both involve the KL divergence).

submitted 205 days ago

comment in response to post

This is what I was responding to. There is no sense in which MLE can be obtained by "swapping the KL" from VI (unless you also totally redefine the symbols p and q) bsky.app/profile/nolo...

submitted 205 days ago

comment in response to post

Actually it confuses two totally different meanings of both the "q" and "p" distributions!

submitted 205 days ago

comment in response to post

The previous discussion confuses two totally different meanings of the "q" distribution.

submitted 205 days ago

comment in response to post

Yes, in MLE there are no latent variables (or, equivalently, they are marginalized out): bsky.app/profile/aaro...

submitted 205 days ago

comment in response to post

Notice there are two difference distributions here with "fittable"/"learnable" parameters: p(x, z) and q(z) Often we use theta and phi to denote those parameters---i.e., p(x, z; theta) and q(z; phi). MLE is then more formally: min_theta KL(Pr(x) || p(x; theta))

submitted 205 days ago

comment in response to post

- Variational inference with "reverse KL" corresponds to minimizing KL(q(z) || p(z | x)) - Variational inference with "forward KL" (which is what "swapping the KL" usually means) corresponds to minimizing KL(p(z | z) || q(z))

submitted 205 days ago

comment in response to post

There are *three* distributions in this conversation which are distinct: - Pr(x): the empirical dist of the data - p(x, z): the assumed generative model, involving latent z - q(z): the variational approximation of p(z | x) MLE corresponds to minimizing KL(Pr(x) | p(x)) where z is marginalized out

submitted 205 days ago