matpagliardini.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

matpagliardini.bsky.social

PhD student in ML at EPFL 🇨🇭working with Martin Jaggi & François Fleuret. Previously Apple MLR (intern). https://mpagli.github.io/

8 posts 223 followers 1,007 following

Posts 6 Comments 8

What is the true depth of an LLM? Together with @danielepal.bsky.social , @matpagliardini.bsky.social, M. Jaggi and @francois.fleuret.org we show that LLMs have a smaller effective depth that can be exploited to increase inference speeds on multi-GPU settings! arxiv.org/abs/2502.02790 (1/N)

submitted 10 days ago • 1 comment

Ok, so I can finally talk about this! We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale. The model has an internal latent space in which it can adaptively spend more compute to think longer. I think the tech report ...🐦‍⬛

submitted 14 days ago • 1 comment

can we scale small, open LMs to o1 level? Using classical probabilistic inference methods, YES! Particle filtering approach to Improved inference w/o any training! Check out probabilistic-inference-scaling.github.io By Aisha Puri et al📈🤖 Joint MIT-CSAIL & RedHat

submitted 17 days ago • 1 comment

new open weights, 24B model, with comparable performance to Llama 3.3 70B 😮. congrats mistral team! mistral.ai/news/mistral...

submitted 25 days ago • 0 comments

1/ 📘 Could ChatGPT get an engineering degree? Spoiler, yes! In our new @pnas.org article, we explore how AI assistants like GPT-4 perform in STEM university courses — and on average they pass a staggering 91.7% of core courses. 🧵 #AI #HigherEd #STEM #LLMs #NLProc

submitted 82 days ago • 1 comment

New blog post on flow matching: dl.heeere.com/cfm/ Contains some nice visuals too!

submitted 89 days ago • 3 comments