kallus.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

kallus.bsky.social

🏳️‍🌈👨‍👨‍👧‍👦 interested in causal inference, experimentation, optimization, RL, statML, econML, fairness Cornell & Netflix www.nathankallus.com

5 posts 414 followers 84 following

Posts 3 Comments 3

comment in response to post

arxiv.org/abs/2302.02392 In offline RL, we replace exploration with assumptions that data is nice. We try to make these minimal by refining standard realizability and coverage assumptions to single policies. We do this via a minimax formulation and strong guarantees for learning the saddle point.

submitted 531 days ago

comment in response to post

arxiv.org/abs/2305.15703 RL only needs mean reward to go (q-fn) so why is distRL (learn whole reward-to-go dist) so empirically effective? We prove distRL is really good when optimal policy has small loss. When that's true then least-squares (q-learning) misses the signal due to heteroskedasticity.

submitted 531 days ago

comment in response to post

arxiv.org/abs/2207.13081 Off-policy eval in POMDPs is tough b/c hidden states ruin memorylessness inducing a curse of horizon. Using histories as instrumental variables, we derive a new Bellman eq for a new kind of v-fn. We solve it using minimax learning to get model-free eval using general fn apx.

submitted 531 days ago