🚨 New reinforcement learning algorithms 🚨 Excited to announce MaxInfoRL, a class of model-free RL algorithms that solves complex continuous control tasks (including vision-based!) by steering exploration towards informative transitions. Details in the thread 👇 - ThreadSky

carlosferrazza.bsky.social • 75 days ago

🚨 New reinforcement learning algorithms 🚨

Excited to announce MaxInfoRL, a class of model-free RL algorithms that solves complex continuous control tasks (including vision-based!) by steering exploration towards informative transitions.

Details in the thread 👇

Comments

carlosferrazza.bsky.social•75 days ago

The core principle is to balance extrinsic rewards with intrinsic exploration. MaxInfoRL achieves this by 1) using an ensemble of dynamics models to estimate information gain, and 2) incorporating this as an automatically-tuned exploration bonus in addition to policy entropy.

carlosferrazza.bsky.social•75 days ago

While standard Boltzmann exploration (e.g., SAC) focuses only on action entropy, MaxInfoRL maximizes entropy in both state and action spaces! This proves to be crucial when dealing with complex exploration settings.

carlosferrazza.bsky.social•75 days ago

MaxInfoRL is a simple, flexible, and scalable add-on to most RL advancements. We combine it with various algorithms, such as SAC, REDQ, DrQv2, DrM, and more – consistently showing improved performance over the respective backbones.

carlosferrazza.bsky.social•75 days ago

By combining MaxInfoRL with DrQv2 and DrM, this achieves state-of-the-art model-free performance on hard visual control tasks such as DMControl humanoid and dog tasks, improving both sample efficiency and steady-state performance.

carlosferrazza.bsky.social•75 days ago

We are also excited to share both Jax and Pytorch implementations, making it simple for RL researchers to integrate MaxInfoRL into their training pipelines.

Jax (built on jaxrl): https://github.com/sukhijab/maxinforl_jax
Pytorch (based on @araffin.bsky.social‘s SB3): https://github.com/sukhijab/maxinforl_torch

carlosferrazza.bsky.social•75 days ago

Work led by amazing @sukhijab.bsky.social at @ucberkeleyofficial.bsky.social AI Research, w/ Stelian Coros, @arkrause.bsky.social , and Pieter Abbeel!

Paper: https://arxiv.org/abs/2412.12098
Website: https://sukhijab.github.io/projects/maxinforl/

Comments

Posting Rules

Reply