kallus.bsky.social
๐ณ๏ธโ๐๐จโ๐จโ๐งโ๐ฆ interested in causal inference, experimentation, optimization, RL, statML, econML, fairness
Cornell & Netflix
www.nathankallus.com
5 posts
414 followers
84 following
comment in response to
post
arxiv.org/abs/2302.02392 In offline RL, we replace exploration with assumptions that data is nice. We try to make these minimal by refining standard realizability and coverage assumptions to single policies. We do this via a minimax formulation and strong guarantees for learning the saddle point.
comment in response to
post
arxiv.org/abs/2305.15703 RL only needs mean reward to go (q-fn) so why is distRL (learn whole reward-to-go dist) so empirically effective? We prove distRL is really good when optimal policy has small loss. When that's true then least-squares (q-learning) misses the signal due to heteroskedasticity.
comment in response to
post
arxiv.org/abs/2207.13081 Off-policy eval in POMDPs is tough b/c hidden states ruin memorylessness inducing a curse of horizon. Using histories as instrumental variables, we derive a new Bellman eq for a new kind of v-fn. We solve it using minimax learning to get model-free eval using general fn apx.