yeah that’s just online RL for infinite horizon. the batch updates often considered (especially if model-based) are an algorithmic choice which pretty much goes back to UCRL (iirc), not a problem definition
I agree from the POV of theory that it's just a practical choice, but practical choices do matter much more than theory let believe... Actually most profound advances in practical results in deep RL come from understanding what practical details are important for SGD to start working properly in RL
So I don't think we should just say "Yeah that's just online RL". Using streaming RL in the name emphasizes the departure from a batched data practice which is widely used currently, which is the whole point of the paper
Comments
if you look at model-free you can find such updates, eg https://arxiv.org/abs/1910.07072