Profile avatar
kallus.bsky.social
๐Ÿณ๏ธโ€๐ŸŒˆ๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘งโ€๐Ÿ‘ฆ interested in causal inference, experimentation, optimization, RL, statML, econML, fairness Cornell & Netflix www.nathankallus.com
5 posts 414 followers 84 following
comment in response to post
arxiv.org/abs/2302.02392 In offline RL, we replace exploration with assumptions that data is nice. We try to make these minimal by refining standard realizability and coverage assumptions to single policies. We do this via a minimax formulation and strong guarantees for learning the saddle point.
comment in response to post
arxiv.org/abs/2305.15703 RL only needs mean reward to go (q-fn) so why is distRL (learn whole reward-to-go dist) so empirically effective? We prove distRL is really good when optimal policy has small loss. When that's true then least-squares (q-learning) misses the signal due to heteroskedasticity.
comment in response to post
arxiv.org/abs/2207.13081 Off-policy eval in POMDPs is tough b/c hidden states ruin memorylessness inducing a curse of horizon. Using histories as instrumental variables, we derive a new Bellman eq for a new kind of v-fn. We solve it using minimax learning to get model-free eval using general fn apx.