Profile avatar
ozekri.bsky.social
ENS Saclay maths dpt + UW Research Intern. Website : https://oussamazekri.fr Blog : https://logb-research.github.io/
20 posts 53 followers 110 following
Regular Contributor
Conversation Starter

🚨 New paper on regression and classification! Adding to the discussion on using least-squares or cross-entropy, regression or classification formulations of supervised problems! A thread on how to bridge these problems:

🚀 Policy gradient methods like DeepSeek’s GRPO are great for finetuning LLMs via RLHF. But what happens when we swap autoregressive generation for discrete diffusion, a rising architecture promising faster & more controllable LLMs? Introducing SEPO ! 📑 arxiv.org/pdf/2502.01384 🧵👇

Beautiful work!!

🚀Proud to share our work on the training dynamics in Transformers with Wassim Bouaziz & @viviencabannes.bsky.social @Inria @MetaAI 📝Easing Optimization Paths arxiv.org/pdf/2501.02362 (accepted @ICASSP 2025 🥳) 📝Clustering Heads 🔥https://arxiv.org/pdf/2410.24050 🖥️ github.com/facebookrese... 1/🧵

Happy to see Disentangled In-Context Learning accepted at ICLR 2025 🥳 Make zero-shot reinforcement learning with LLMs go brrr 🚀 🖥️ github.com/abenechehab/... 📜 arxiv.org/pdf/2410.11711 Congrats Abdelhakim (abenechehab.github.io) for leading it, always fun working with nice and strong people 🤗

For the French-speaking audience, S. Mallat's courses at the College de France on Data generation in AI by transport and denoising have just started. I highly recommend them, as I've learned a lot from the overall vision of his courses. Recordings are also available: www.youtube.com/watch?v=5zFh...

Speculative sampling accelerates inference in LLMs by drafting future tokens which are verified in parallel. With @vdebortoli.bsky.social , A. Galashov & @arthurgretton.bsky.social , we extend this approach to (continuous-space) diffusion models: arxiv.org/abs/2501.05370

The idea that one needs to know a lot of advanced math to start doing research in ML seems so wrong to me. Instead of reading books for weeks and forgetting most of them a year later, I think it's much better to try do things, see what knowledge gaps prevent you from doing them, and only then read.

This seems like… what we started with, no? arxiv.org/abs/2410.02724

🚨So, you want to predict your model's performance at test time?🚨 💡Our NeurIPS 2024 paper proposes 𝐌𝐚𝐍𝐨, a training-free and SOTA approach! 📑 arxiv.org/pdf/2405.18979 🖥️https://github.com/Renchunzi-Xie/MaNo 1/🧵(A surprise at the end!)

I wrote a summary of the main ingredients of the neat proof by Hugo Lavenant that diffusion models do not generally define optimal transport. github.com/mathematical...

🚀 Did you know you can use the in-context learning abilities of an LLM to estimate the transition probabilities of a Markov chains? The results are pretty exciting ! 😄