The Bitter lesson comes back all the time. For RL, it is about time we recognize how underutilized our hardware is. The JAX based RL stuffs opened the way, but there is much more work ahead on parallel RL algos.
This isn’t a general solution to RL. The point is to make learning algorithms sample efficient. If the environment you are doing RL on is the real world, you can’t make the “environment go fast”.
With “infinite samples”, you can random sample policies till you stumble on one with high reward.
I don't think it's a general solution to RL but it is a way to make good policies for many problems I care about and in which I don't particularly care about real world learning
If one only cares about learning in simulators then they can simplify the problem. E.g., assume they have a perfect model, the environment state, and the ability to jump to arbitrary states.
This simpler setting is solved from a research perspective imo which is why engineering is the bottleneck.
Hmm, can you clarify what you mean by "solved from a research perspective"? I would say that even in that domain we don't always know how to efficiently construct a policy
You’re correct, there’s plenty of simulated environments we can’t solve yet. But do you consider having 1 million parallel instances of an environment sped up 100x solving it with PPO with low wall clock time a desirable solution?
Although the distinction between real-world vs simulation is not the right one. The right abstraction is big worlds vs small worlds [1]. We don't have algorithms that can learn in big worlds.
Okay I'm glad I asked because this makes clear the disagreement. A few things:
1) I don't consider either of those solved. OpenAI Five did not appropriately restrict the click rate, it's questionable whether AlphaStar reached superhuman level
2) Even given that, there are many problems that fall
1/2
I have tried and failed to make a similar argument for years. Every NSF proposal I submit about SoftEng for robotics gets destroyed because it's understood as mere development. It's much more, making the right tools yields thinking frameworks that become productivity and creativity amplifiers.
Some people don’t see my efforts to create better simulations during my PhD as "real science." I get that it’s not technically science in itself, but you need better simulations to do better science!
I've been arguing pretty much the same thing in quantum for over two decades now. Too many people in #quantumcomputing know so little about systems engineering and architecture that they don't know what they don't know.
have people not been doing that for a while, e.g. self driving? in the limit of inf storage just have an infinite lookup table. still, quite a few representation learning advances were needed to get where we are today.
Comments
With “infinite samples”, you can random sample policies till you stumble on one with high reward.
This simpler setting is solved from a research perspective imo which is why engineering is the bottleneck.
We can make the recipe more efficient but there is no research bottleneck imo.
Real-world learning requires new ideas. Existing algorithms completely fail.
[1] The Big World Hypothesos and its Ramifications
https://openreview.net/pdf?id=Sv7DazuCn8
1) I don't consider either of those solved. OpenAI Five did not appropriately restrict the click rate, it's questionable whether AlphaStar reached superhuman level
2) Even given that, there are many problems that fall
1/2
https://www.microsoft.com/en-us/research/uploads/prod/2020/11/Leiserson-et-al-Theres-plenty-of-room-at-the-top.pdf