Profile avatar
antoine-mln.bsky.social
doing a phd in RL/online learning on questions related to exploration and adaptivity > https://antoine-moulin.github.io/
56 posts 2,069 followers 205 following
Regular Contributor
Active Commenter
comment in response to post
link: arxiv.org/abs/2502.139...
comment in response to post
Congrats Wil! 🫶🫶
comment in response to post
Every reason is a good reason to visit Csaba
comment in response to post
I read the thread and answered a bit quickly, my bad lol. I agree with your 2nd reply, and what I had in mind was indeed like your 1st comment (eg in RL, if after exploring optimistically you get a good policy, then it suggests your CIs on the transition estimates were good enough)
comment in response to post
I guess an example is when you do optimism to explore in sequential decision-making problems, the regret you get pretty much scale as the width of the confidence intervals you use?
comment in response to post
I wouldn’t qualify BCO as « on the verge of being solved » but the recent advances are indeed exciting!
comment in response to post
I like your handle :)
comment in response to post
(I mean for GD)
comment in response to post
Stupid question but do we know if the new rate can be improved (or not)? The exponent seems a bit random
comment in response to post
Then a consequence of Sandroni's theorem (bsky.app/profile/aaro...) is that there is no empirical test that can distinguish this scenario from the alternative scenario in which the forecaster actually knows the outcome distribution at each round and correctly forecasts the most likely outcome. 2/2
comment in response to post
Thanks!
comment in response to post
+ @rl-theory.bsky.social
comment in response to post
oops Gene is already in
comment in response to post
@hahahaudrey.bsky.social @gioramponi.bsky.social @jasondeanlee.bsky.social @geneli.bsky.social @aldopacchiano.bsky.social @sikatasengupta.bsky.social
comment in response to post
agreed :) and I think a slightly more complete version will come out relatively soon (🤞) bsky.app/profile/anto...
comment in response to post
like « people who use Jensen’s inequality the wrong way »
comment in response to post
it’s surprising it’s not built in the app already (or at least get a notification), what if someone adds me to a shady starter pack I don’t want to be part of
comment in response to post
yes this is why we disagree :)
comment in response to post
for me the *problem def* of online RL in infinite horizon = streaming RL. then the alg may or may not update the policy at every time step in finite horizon one « time step » would typically correspond to a whole episode so not really one update per sample
comment in response to post
Chaps 2-4-5-6 of Puterman’s book for the basics on MDPs?
comment in response to post
ofc, I think most theorists are aware of the importance of implem details. all I’m saying is RL is quite confusing already and I’m not sure introducing a new term to refer to a canonical setting (or « true » to quote you) is helping. otoh I’m fine with their « stream Q-learning » for instance
comment in response to post
I understand the emphasis for deep RL but since we have words to describe this we might as well use them idk
comment in response to post
yeah that’s just online RL for infinite horizon. the batch updates often considered (especially if model-based) are an algorithmic choice which pretty much goes back to UCRL (iirc), not a problem definition if you look at model-free you can find such updates, eg arxiv.org/abs/1910.07072
comment in response to post
I’ve never seen this terminology before. Is it not just online RL?
comment in response to post
@sharky6000.bsky.social is the owner
comment in response to post
I’m down 🫡
comment in response to post
cool idea! intuitively it feels like it could be related to a notion of coverability sometimes used in RL (eg def 2 arxiv.org/abs/2210.04157; typically bad if the MDP is a tree). at least they seem to serve the same purpose, although you can’t use the latter
comment in response to post
for RL I unfortunately don't have a single ref and would typically go over several books/classics papers. I remember this was quite frustrating when I started doing RL... and for optimization I never really know which one to look at among the classic ones
comment in response to post
ML in general: - Learning from First Principles, Bach (www.di.ens.fr/~fbach/ltfp_...), I only read an early draft that was half the size it is now but I enjoyed the clarity+precision - Understanding Deep Learning, Simon Prince (udlbook.github.io/udlbook/), only skimmed but heard many good reviews
comment in response to post
a bit more niche but when it comes to learning with bandit feedback, i cannot overstate how well written those books are - Bandit Algorithms, Lattimore, Szepesvari (tor-lattimore.com/downloads/bo...) - Bandit Convex Optimization, Lattimore (tor-lattimore.com/downloads/cv...)
comment in response to post
my go-to ref in online learning (and perhaps one of my fav reads) is "A Modern Introduction to Online Learning" from Orabona (arxiv.org/abs/1912.13213). not as complete as "Prediction, Learning and Games" from Cesa-Bianchi and Lugosi but still very cool