Profile avatar
spmontecarlo.bsky.social
Lecturer in Maths & Stats at Bristol. Interested in probabilistic + numerical computation, statistical modelling + inference. (he / him). Homepage: https://sites.google.com/view/sp-monte-carlo Seminar: https://sites.google.com/view/monte-carlo-semina
812 posts 2,179 followers 1,861 following
Regular Contributor
Active Commenter
comment in response to post
Yep!
comment in response to post
Bridged up to the nines, we love to see it.
comment in response to post
the (rather condensed by my standards) slides for my own talk are available here: github.com/sampower88/t...
comment in response to post
There's some hope for a sequel next year!
comment in response to post
FWIW: onionesquereality.wordpress.com/2013/09/25/s...
comment in response to post
Same deal:
comment in response to post
Ah, sure! I find it interesting that at some level of abstraction, it's operationally the same. I also like that there are certain kinds of robustness (to e.g. mis-specification, computational aspects) which are baked into this view, though of course you lose certain other things.
comment in response to post
small typo here - it should rather be γ = 𝐄[ΘG(X)⊤]⋅𝐄[G(X)G(X)⊤]⁻¹ and θ̂(x) = 𝐄[ΘG(X)⊤]⋅𝐄[G(X)G(X)⊤]⁻¹⋅G(x) = 𝐄[Θ K(X, x)] for some function K which can be worked out from the above.
comment in response to post
In any case, it's quite sweet stuff. The baby version of all of this is "Bayes Linear"; the more advanced version is "Kernel Bayes Rule" or "The Linear Conditional Expectation". It's also connected in some ways to different ways of amortising Bayesian inference, as popular in e.g. modern SBI.
comment in response to post
There's a catch, though: if one isn't careful, it can happen that L(θ, x) is sometimes negative! That is, one can define a roughly-consistent way of approximating conditional expectations in a given basis, which correspond to integrating against some measure, but it might be a signed measure!
comment in response to post
This is not specific to the parameter itself; for any function ϕ = H(θ), one finds that the best G-linear predictor of ϕ will _also_ be expressible in this way, for the same L. As such, it becomes tempting to define the "posterior" of θ given x as being Posterior(dθ | X = x) = Prior(dθ)⋅L(θ, x).
comment in response to post
One can also do something quite evocative here: defining L(θ,x) = 𝐄[G(X)⊤ {𝐄[G(X)G(X)⊤]⁻¹} G(x) | θ] (where the expectation is taken with respect to the law of X given θ), it holds that θ̂(x) = ∫ Prior(dθ)⋅L(θ,x)⋅θ, i.e. this "G-conditional expectation" resembles a posterior mean!
comment in response to post
This is a familiar problem: linear regression! We can even write down the optimal solution as γ = 𝐄[G(X)G(X)⊤]⁻¹𝐄[ΘG(X)⊤], with the expectation taken over the joint distribution of (Θ,X). This then yields the estimator θ̂(x) = 𝐄[G(X)G(X)⊤]⁻¹𝐄[Θ⋅〈G(X), G(x)〉].
comment in response to post
Something fun which one can ask is the following: given some feature mapping G which sends observations to a linear space, what is our best estimator of the model parameters which is linear in G? That is, find γ such that on average, θ ≈〈γ,G(x)〉 in the sense of minimising mean squared error.
comment in response to post
However, there exist equivalent formulations which don't make directly ask for the posterior. For example, Bayes estimators minimise average error when averaged over the joint distribution of model parameters and observations, and this is well-posed without introducing the notion of the posterior.
comment in response to post
Oo, unfortunately not - I'll have to take a look!
comment in response to post
One thing which it illustrates nicely is that unbounded negative curvature is a good source of counter-intuition and counter-examples.
comment in response to post
some context:
comment in response to post
I definitely give Kolmogorov a pass on certain things which I wouldn't give to everybody!
comment in response to post
Yeah, you can definitely pursue this; the Menz-Schlichting result is sort of proven in this way IIRC. What's a bit interesting is that this decomposition strategy can work well for both negative results about basic dynamics and positive results about tempering-type extended dynamics.
comment in response to post
Anyways, I'm hoping to gradually flesh out a few more examples, if only for my own sake. I like the idea of getting these sorts of basic and intuitive results into a 'teachable' form, and this seems a good opportunity to do so.
comment in response to post
I've historically found it most natural to think about positive results rather than negative results (for uncomplicated reasons), but I do increasingly find the latter very interesting, particularly in terms of hammering down at questions about what 'really' makes a problem difficult, and so on.
comment in response to post
It's an odd story, then; the soft picture is very clear, as are the practical implications, but the details of how to technically materialise this intuition into simple proofs are - surprisingly - not as fully realised as one (or at least I) might expect.
comment in response to post
One would hope that there's a neat way to port that intuition over to a more finitary setting, which is partially true - a heroic work of Menz and Schlichting shows that with a similar set of assumptions, one can make sharp estimates on e.g. the spectral gap of the process. But it's quite some work!
comment in response to post
There's a nice - and very general - result associated with Arrhenius, Eyring, Kramers, etc. which describes the asymptotic behaviour of between-mode transition times, but it feels very much an asymptotic result, couched in the language of low-temperature limits and large deviations.
comment in response to post
The situation is particularly stark in dimension greater than one, where the possibility of making explicit calculations become quite a bit more limited. In one dimension, you only have to block off one path between modes; in higher dimensions, there are additional routes for a particle to follow.
comment in response to post
Still, if you go looking for concrete results showing that for some model problem (or family of problems), the situation is bad in terms of { spectral gap, conductance, mixing time, etc. }, then you find there is not such a wealth of examples for which the details have been worked through properly.
comment in response to post
Basically, it's sufficiently well-appreciated that certain types of multimodality will cause plenty of basic samplers to fail catastrophically, and so it's not necessarily that insightful to work through the details of just how bad things are - in short, the message is to do something else.
comment in response to post
bsky.app/profile/spmo...
comment in response to post
sites.stat.washington.edu/raftery/Rese...
comment in response to post
From some related lecture notes:
comment in response to post
I had heard of it / seen it cited lots before, but I think that I had the wrong impression of what type of paper it was. It's very dense with new information, pay-offs, etc. in a way that I didn't necessarily expect (even as somebody who knows many of the objects involved).