Profile avatar
ryanpsullivan.bsky.social
PhD Candidate at the University of Maryland researching reinforcement learning and autocurricula in complex, open-ended environments. Previously RL intern @ SonyAI, RLHF intern @ Google Research, and RL intern @ Amazon Science
22 posts 57 followers 55 following
Getting Started
Conversation Starter

"As researchers, we tend to publish only positive results, but I think a lot of valuable insights are lost in our unpublished failures." New blog post: Getting SAC to Work on a Massive Parallel Simulator (part I) araffin.github.io/post/sac-mas...

I’m heading to AAAI to present our work on multi-objective preference alignment for DPO from my internship with GoogleAI. If anyone wants to chat about RLHF, RL in games, curriculum learning, or open-ended environments please reach out!

Looking for a principled evaluation method for ranking of *general* agents or models, i.e. that get evaluated across a myriad of different tasks? I’m delighted to tell you about our new paper, Soft Condorcet Optimization (SCO) for Ranking of General Agents, to be presented at AAMAS 2025! 🧵 1/N

We released the OLMo 2 report! Ready for some more RL curves? 😏 This time, we applied RLVR iteratively! Our initial RLVR checkpoint on the RLVR dataset mix shows a low GSM8K score, so we did another RLVR on GSM8K only and another on MATH only 😆. And it works! A thread 🧵 1/N

My recurrent refrain of the year is to really use the environments in pufferlib. There’s no reason not to have your environments run at a million fps on a single cpu core github.com/PufferAI/Puf...

Have you ever wanted to add curriculum learning (CL) to an RL project but decided it wasn't worth the effort? I'm happy to announce the release of Syllabus, a library of portable curriculum learning methods that work with any RL code! github.com/RyanNavillus...

Another awesome iteration of Genie! I fully agree with training generalist agents in simulation like this, though I believe in using real games to teach long-term strategies. Still, it’s easy to see how SIMA and Genie will continue to improve, and maybe even give us a true foundation model for RL.

This is one of my favorite lines of work in RL. When I was starting my PhD, I was working on a multi-agent evaluation problem, having just finished a “voting math” class my last semester at Purdue. I scribbled some notes about how games in a tournament could be viewed as votes… 1/2

I just got here, thanks @rockt.ai for putting together an open-endedness starter pack! If there's anyone else working on exploration, curriculum learning, or open-ended environments, leave a reply so I can follow you! I'll be sharing some cool curriculum learning work in a few days, stay tuned!