Today’s RL algorithms are usually not great at long-term planning in complex environments, mainly because long-term planning in complex env’s is a hard problem. E.g. combinatorial explosion of possibilities. (So much the worse for today’s RL algorithms!) But I don’t think that relates to humans 1/3
Comments
I still think this misses something - the combinatorial explosion problem affects human minds' learning process as well, and it seems to mostly be solved via memesis / cultural defaults / system 2 planning, which all very notably aren't direct learning.