Profile avatar
amirmesbah.bsky.social
Graduate Student - Interested in RL and its mathematics πŸ‘Ύ > https://amirhosein-mesbah.github.io/
6 posts 45 followers 425 following
Regular Contributor

First 11 chapters of RLHF Book have v0 draft done. Should be useful now. Next: * Crafting more blog content into future topics, * DPO+ chapter, * Meeting with publishers to get wheels turning on physical copies, * Cleaning & cohesiveness rlhfbook.com

🚨 Neuromatch Academy Course Applications are OPEN for 2025!! 🚨 Get your application in early to be a student or teaching assistant for this year’s courses! Applications are due Sunday, March 23. Apply & learn more: neuromatch.io/courses/ #mlsky #compneurosky #ai #climatesolutions #ScienceEdu πŸ§ͺ

2014 GoogLeNet: The best image classifier was only trainable using weeks of Google's custom infrastructure. 2018 ResNet: A more accurate model is trainable in a 1/2 hour on a single GPU. What stops this from happening for LLMs?

I am teaching a class on #FoundationalModels for #robotics and Scaling #DeepRL algorithms. This class expands on last year's class and my generalist robotics policies tutorial and code. I plan to share the lectures and code assignments. Starting with the first lectures below.

I wonder why ML conferences insist on uploading workshop videos on SlideShare while they can use YouTube and the benefits of monetization. Talks on SlideShare are really hard to track!

i was recently asked to provide 4 "desert island" RL papers. if i were stuck on a desert island i'd hope to have something better to read than #RL papers... but anyway, here's a thread with my choices, maybe you can read them on your flight to @neuripsconf.bsky.social #NeurIPS2024 . Enjoy!

If you're an RL researcher or RL adjacent, pipe up to make sure I've added you here! go.bsky.app/3WPHcHg

As my first post on this platform, allow me to advertise the RL theory lecture notes I have been developing with Sasha Rakhlin: arxiv.org/abs/2312.16730 (shameless repost of my pinned tweet)

I have become a fan of the game-theoretic approaches to RLHF, so here are two more papers in that category! (with one more tomorrow πŸ˜…) 1. Self-Play Preference Optimization (SPO). 2. Direct Nash Optimization (DNO). 🧡 1/3.

Hey academic Bluesky πŸ‘€πŸ‘‹