Profile avatar
gabrielpeyre.bsky.social
44 posts 2,442 followers 165 following
Regular Contributor
Active Commenter

Mon article sur les maths de l’IA est paru dans la gazette de la smf smf.emath.fr/publications... La version en anglais est sur arxiv arxiv.org/abs/2501.10465

I have cleaned a bit my lecture notes on Optimal Transport for Machine Learners arxiv.org/abs/2505.06589

Announcing : The 2nd International Summer School on Mathematical Aspects of Data Science mathsdata2025.github.io EPFL, Sept 1–5, 2025 Speakers: Bach @bachfrancis.bsky.social Bandeira Mallat Montanari Peyré @gabrielpeyre.bsky.social For PhD students & early-career researchers Apply before May 15!

Applications are 📣OPEN📣 for #PAISS2025 THE AI summer school in #Grenoble 1-5 Sept! Speakers so far @yann-lecun.bsky.social @dimadamen.bsky.social @arthurgretton.bsky.social @gabrielpeyre.bsky.social @science4all.org A. Cristia J. Revaud M. Caron J. Carpentier M. Vladimirova ➡️ paiss.inria.fr

The AI for Science summer school, coorganized by CNRS and U of Chicago will be in Paris, June 30th to july 4th, register asap if you want attend! datascience.uchicago.edu/events/ai-sc...

Futur best seller!

Characterizing finely the decay of eigenvalues of kernel matrices: many people need it, but explicit references are hard to find. This blog post reviews amazing asymptotic results from Harold Widom (1963!) and proposes new non-asymptotic bounds. francisbach.com/spectrum-ker...

⚡️Check out our workshop tomorrow at @lpiparis.bsky.social, great speakers (@gabrielpeyre.bsky.social, @sdascoli.bsky.social, @samillingworth.com & many more) will cover Theory and Applications of Generative AI + Connexions with neuroscience 🧠 And there's food 🍰 ➡️ genai-conference-website.vercel.app

kyunghyuncho.me/softmax-fore...

@vickykalogeiton.bsky.social and @davidpicard.bsky.social updating live there slides to quote my talk just before ... next level presentation!

Titouan Vayer and I are organizing a one day workshop on optimal transport and machine learning in ENS Lyon on Feb. 17. Registration is free but mandatory. The incredible keynote speakers are Laetitia Chapel, Filippo Santambrogio and @brunolevy01.bsky.social. gdr-iasis.cnrs.fr/reunions/tra...

Excited to see Sigmoid Attention accepted at ICLR 2025 !! Make attention ~18% faster with a drop-in replacement 🚀 Code: github.com/apple/ml-sig... Paper arxiv.org/abs/2409.04431

How do tokens evolve as they are processed by a deep Transformer? With José A. Carrillo, @gabrielpeyre.bsky.social and @pierreablin.bsky.social, we tackle this in our new preprint: A Unified Perspective on the Dynamics of Deep Transformers arxiv.org/abs/2501.18322 ML and PDE lovers, check it out!

This review paper by @guillaume-garrigos.com on SGD-related algorithms is a fantastic resource, offering elegant, self-contained, and concise proofs in a single, accessible reference. arxiv.org/pdf/2301.11235

The Mathematics of Artificial Intelligence: In this introductory and highly subjective survey, aimed at a general mathematical audience, I showcase some key theoretical concepts underlying recent advancements in machine learning. arxiv.org/abs/2501.10465

Slides for a general introduction to the use of Optimal Transport methods in learning, with an emphasis on diffusion models, flow matching, training 2 layers neural networks and deep transformers. speakerdeck.com/gpeyre/optim...

Convex functions are differentiable (Hans Rademacher, 1919) and twice differentiable (Alexandre Alexandrov, 1939) almost everywhere. en.wikipedia.org/wiki/Alexand...

It is well known that symmetric matrices can be diagonalized on R. A lesser known, more general, result is that if A and B are symmetric d x d matrices and A is positive definite, then AB is diagonalizable.

In 1988, John Lafferty introduced what he called the “density manifold,” which corresponds to the Wasserstein metric later studied in depth by Felix Otto in the context of Optimal Transport gradient flows. (Yes, he also later co-developed Conditional Random Fields!) www.jstor.org/stable/2000885

This paper by Hornik et al demonstrates the *uniform* approximation universality of 2-layer MLPs with sigmoid activation functions, leveraging that sinusoids can approximate any function through Fourier expansion. www.cs.cmu.edu/~epxing/Clas...

Optimal transport, convolution, and averaging define interpolations between probability distributions. One can find vector fields advecting particles that match these interpolations. They are the Benamou-Brenier, flow-matching, and Dacorogna-Moser fields.

I have updated my course notes on Optimal Transport with a new Chapter 9 on Wasserstein flows. It includes 3 illustrative applications: training a 2-layer MLP, deep transformers, and flow-matching generative models. You can access it here: mathematical-tours.github.io/book-sources...

Hellinger and Wasserstein are the two main geodesic distances on probability distributions. While both minimize the same energy, they differ in their interpolation methods: Hellinger focuses on density, whereas Wasserstein emphasizes position displacements.

Optimal transport computes an interpolation between two distributions using an optimal coupling. Flow matching, on the other hand, uses a simpler “independent” coupling, which is the product of the marginals.