ozekri.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

ozekri.bsky.social

ENS Saclay maths dpt + UW Research Intern. Website : https://oussamazekri.fr Blog : https://logb-research.github.io/

20 posts 53 followers 110 following

Posts 12 Comments 17

🚨 New paper on regression and classification! Adding to the discussion on using least-squares or cross-entropy, regression or classification formulations of supervised problems! A thread on how to bridge these problems:

submitted 17 days ago • 4 comments

🚀 Policy gradient methods like DeepSeek’s GRPO are great for finetuning LLMs via RLHF. But what happens when we swap autoregressive generation for discrete diffusion, a rising architecture promising faster & more controllable LLMs? Introducing SEPO ! 📑 arxiv.org/pdf/2502.01384 🧵👇

submitted 23 days ago • 1 comment

Beautiful work!!

submitted 23 days ago • 0 comments

🚀Proud to share our work on the training dynamics in Transformers with Wassim Bouaziz & @viviencabannes.bsky.social @Inria @MetaAI 📝Easing Optimization Paths arxiv.org/pdf/2501.02362 (accepted @ICASSP 2025 🥳) 📝Clustering Heads 🔥https://arxiv.org/pdf/2410.24050 🖥️ github.com/facebookrese... 1/🧵

submitted 23 days ago • 1 comment

Happy to see Disentangled In-Context Learning accepted at ICLR 2025 🥳 Make zero-shot reinforcement learning with LLMs go brrr 🚀 🖥️ github.com/abenechehab/... 📜 arxiv.org/pdf/2410.11711 Congrats Abdelhakim (abenechehab.github.io) for leading it, always fun working with nice and strong people 🤗

submitted 33 days ago • 0 comments

For the French-speaking audience, S. Mallat's courses at the College de France on Data generation in AI by transport and denoising have just started. I highly recommend them, as I've learned a lot from the overall vision of his courses. Recordings are also available: www.youtube.com/watch?v=5zFh...

submitted 38 days ago • 0 comments

Speculative sampling accelerates inference in LLMs by drafting future tokens which are verified in parallel. With @vdebortoli.bsky.social , A. Galashov & @arthurgretton.bsky.social , we extend this approach to (continuous-space) diffusion models: arxiv.org/abs/2501.05370

submitted 48 days ago • 1 comment

The idea that one needs to know a lot of advanced math to start doing research in ML seems so wrong to me. Instead of reading books for weeks and forgetting most of them a year later, I think it's much better to try do things, see what knowledge gaps prevent you from doing them, and only then read.

submitted 83 days ago • 4 comments

This seems like… what we started with, no? arxiv.org/abs/2410.02724

submitted 86 days ago • 16 comments

🚨So, you want to predict your model's performance at test time?🚨 💡Our NeurIPS 2024 paper proposes 𝐌𝐚𝐍𝐨, a training-free and SOTA approach! 📑 arxiv.org/pdf/2405.18979 🖥️https://github.com/Renchunzi-Xie/MaNo 1/🧵(A surprise at the end!)

submitted 86 days ago • 2 comments

I wrote a summary of the main ingredients of the neat proof by Hugo Lavenant that diffusion models do not generally define optimal transport. github.com/mathematical...

submitted 89 days ago • 6 comments

🚀 Did you know you can use the in-context learning abilities of an LLM to estimate the transition probabilities of a Markov chains? The results are pretty exciting ! 😄

submitted 93 days ago • 1 comment