gabrielpeyre.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

gabrielpeyre.bsky.social

44 posts 2,442 followers 165 following

Posts 24 Comments 26

Mon article sur les maths de l’IA est paru dans la gazette de la smf smf.emath.fr/publications... La version en anglais est sur arxiv arxiv.org/abs/2501.10465

submitted 2 days ago • 0 comments

I have cleaned a bit my lecture notes on Optimal Transport for Machine Learners arxiv.org/abs/2505.06589

submitted 6 days ago • 0 comments

Announcing : The 2nd International Summer School on Mathematical Aspects of Data Science mathsdata2025.github.io EPFL, Sept 1–5, 2025 Speakers: Bach @bachfrancis.bsky.social Bandeira Mallat Montanari Peyré @gabrielpeyre.bsky.social For PhD students & early-career researchers Apply before May 15!

submitted 34 days ago • 1 comment

Applications are 📣OPEN📣 for #PAISS2025 THE AI summer school in #Grenoble 1-5 Sept! Speakers so far @yann-lecun.bsky.social @dimadamen.bsky.social @arthurgretton.bsky.social @gabrielpeyre.bsky.social @science4all.org A. Cristia J. Revaud M. Caron J. Carpentier M. Vladimirova ➡️ paiss.inria.fr

submitted 37 days ago • 0 comments

The AI for Science summer school, coorganized by CNRS and U of Chicago will be in Paris, June 30th to july 4th, register asap if you want attend! datascience.uchicago.edu/events/ai-sc...

submitted 47 days ago • 0 comments

Futur best seller!

submitted 51 days ago • 2 comments

Characterizing finely the decay of eigenvalues of kernel matrices: many people need it, but explicit references are hard to find. This blog post reviews amazing asymptotic results from Harold Widom (1963!) and proposes new non-asymptotic bounds. francisbach.com/spectrum-ker...

submitted 55 days ago • 0 comments

⚡️Check out our workshop tomorrow at @lpiparis.bsky.social, great speakers (@gabrielpeyre.bsky.social, @sdascoli.bsky.social, @samillingworth.com & many more) will cover Theory and Applications of Generative AI + Connexions with neuroscience 🧠 And there's food 🍰 ➡️ genai-conference-website.vercel.app

submitted 81 days ago • 0 comments

kyunghyuncho.me/softmax-fore...

submitted 100 days ago • 3 comments

@vickykalogeiton.bsky.social and @davidpicard.bsky.social updating live there slides to quote my talk just before ... next level presentation!

submitted 100 days ago • 0 comments

Titouan Vayer and I are organizing a one day workshop on optimal transport and machine learning in ENS Lyon on Feb. 17. Registration is free but mandatory. The incredible keynote speakers are Laetitia Chapel, Filippo Santambrogio and @brunolevy01.bsky.social. gdr-iasis.cnrs.fr/reunions/tra...

submitted 121 days ago • 3 comments

Excited to see Sigmoid Attention accepted at ICLR 2025 !! Make attention ~18% faster with a drop-in replacement 🚀 Code: github.com/apple/ml-sig... Paper arxiv.org/abs/2409.04431

submitted 114 days ago • 1 comment

How do tokens evolve as they are processed by a deep Transformer? With José A. Carrillo, @gabrielpeyre.bsky.social and @pierreablin.bsky.social, we tackle this in our new preprint: A Unified Perspective on the Dynamics of Deep Transformers arxiv.org/abs/2501.18322 ML and PDE lovers, check it out!

submitted 107 days ago • 2 comments

This review paper by @guillaume-garrigos.com on SGD-related algorithms is a fantastic resource, offering elegant, self-contained, and concise proofs in a single, accessible reference. arxiv.org/pdf/2301.11235

submitted 109 days ago • 1 comment

The Mathematics of Artificial Intelligence: In this introductory and highly subjective survey, aimed at a general mathematical audience, I showcase some key theoretical concepts underlying recent advancements in machine learning. arxiv.org/abs/2501.10465

submitted 116 days ago • 2 comments

Slides for a general introduction to the use of Optimal Transport methods in learning, with an emphasis on diffusion models, flow matching, training 2 layers neural networks and deep transformers. speakerdeck.com/gpeyre/optim...

submitted 123 days ago • 4 comments

Convex functions are differentiable (Hans Rademacher, 1919) and twice differentiable (Alexandre Alexandrov, 1939) almost everywhere. en.wikipedia.org/wiki/Alexand...

submitted 154 days ago • 1 comment

It is well known that symmetric matrices can be diagonalized on R. A lesser known, more general, result is that if A and B are symmetric d x d matrices and A is positive definite, then AB is diagonalizable.

submitted 154 days ago • 3 comments

In 1988, John Lafferty introduced what he called the “density manifold,” which corresponds to the Wasserstein metric later studied in depth by Felix Otto in the context of Optimal Transport gradient flows. (Yes, he also later co-developed Conditional Random Fields!) www.jstor.org/stable/2000885

submitted 155 days ago • 0 comments

This paper by Hornik et al demonstrates the uniform approximation universality of 2-layer MLPs with sigmoid activation functions, leveraging that sinusoids can approximate any function through Fourier expansion. www.cs.cmu.edu/~epxing/Clas...

submitted 159 days ago • 1 comment

Optimal transport, convolution, and averaging define interpolations between probability distributions. One can find vector fields advecting particles that match these interpolations. They are the Benamou-Brenier, flow-matching, and Dacorogna-Moser fields.

submitted 165 days ago • 1 comment

I have updated my course notes on Optimal Transport with a new Chapter 9 on Wasserstein flows. It includes 3 illustrative applications: training a 2-layer MLP, deep transformers, and flow-matching generative models. You can access it here: mathematical-tours.github.io/book-sources...

submitted 165 days ago • 2 comments

Hellinger and Wasserstein are the two main geodesic distances on probability distributions. While both minimize the same energy, they differ in their interpolation methods: Hellinger focuses on density, whereas Wasserstein emphasizes position displacements.

submitted 166 days ago • 1 comment

Optimal transport computes an interpolation between two distributions using an optimal coupling. Flow matching, on the other hand, uses a simpler “independent” coupling, which is the product of the marginals.

submitted 167 days ago • 9 comments