Profile avatar
huytransformer.com
ML, Phys We're living in a simulation 🧑‍💻
76 posts 5,206 followers 1,522 following
Prolific Poster

Partially Frozen Random Networks Contain Compact Strong Lottery Tickets Hikari Otsuka, Daiki Chijiwa, Ángel López García-Arias et al. Action editor: Zhangyang Wang https://openreview.net/forum?id=xpnPYfufhz #imagenet #subnetworks #sparser

Sangwoong Yoon, Himchan Hwang, Hyeokju Jeong, Dong Kyu Shin, Che-Sang Park, Sehee Kwon, Frank Chongwoo Park: Value Gradient Sampler: Sampling as Sequential Decision Making https://arxiv.org/abs/2502.13280 https://arxiv.org/pdf/2502.13280 https://arxiv.org/html/2502.13280

The Hausdorff distance quantifies how far two sets are from each other by measuring the largest minimal distance needed to cover one with the other. It captures worst-case deviation: if one set has points far from the other, the distance is large.

Signed Distance Functions/Fields define shapes implicitly by measuring the shortest distance from any point to the surface. Negative values are inside, and positive values are outside. SDF allow smooth blending, Boolean ops for procedural modeling. https://iquilezles.org/articles/distfunctions/

Excited about our progress in characterizing The Computational Advantage of Depth in Learning with Neural Networks. Check out the number of samples that can be saved when GD runs on a multi-layer rather than on a two-layer neural network. arxiv.org/pdf/2502.13961

All the good stuff from decades ago is now coming back and making its way into modern deep architectures: PCA, wavelets, Fourier transform ... Now waiting for the Hough transform and Zernike moments 😀

I'm planning to resume my habit of periodically publishing old blog notes which I've written up (many of which originate in old tweet threads and such), even if they're not especially polished. One which I still find pretty cute: 'Gaussian Smoothing' hackmd.io/@sp-monte-ca...

Ivan Skorokhodov, Sharath Girish, Benran Hu, Willi Menapace, Yanyu Li, Rameen Abdal, Sergey Tulyakov, Aliaksandr Siarohin Improving the Diffusability of Autoencoders https://arxiv.org/abs/2502.14831

Leo Zhang, Peter Potaptchik, Arnaud Doucet, Hai-Dang Dau, Saifuddin Syed Generalised Parallel Tempering: Flexible Replica Exchange via Flows and Diffusions https://arxiv.org/abs/2502.10328

arxiv.org/abs/2006.10739

📣 New preprint 📣 Learning Theory for Kernel Bilevel Optimization w/ @fareselkhoury.bsky.social E. Pauwels @michael-arbel.bsky.social We provide generalization error bounds for bilevel optimization problems where the inner objective is minimized over a RKHS. arxiv.org/abs/2502.08457

Are you still using LoRA to fine-tune your LLM? 2024 has seen an explosion of new parameter-efficient fine tuning technique (PEFT), thanks to clever uses of the singular value decomposition (SVD). Let's dive into the alphabet soup: SVF, SVFT, MiLoRA, PiSSA, LoRA-XS 🤯...

Implemented the Principle of Stationary Action (for a minimum) using Apple's MLX autograd, bridging variational classical mechanics and modern ML gradients. The animation shows the convergence to the true equation of motion by minimizing the action integral. Made with #python #mlx #matplotlib

The GAN is dead; long live the GAN! A Modern Baseline GAN This is a very interesting paper, exploring making GANs simpler and more performant. abs: arxiv.org/abs/2501.05441 code: github.com/brownvc/R3GAN

📢PSA: #NeurIPS2024 recordings are now publicly available! The workshops always have tons of interesting things on at once, so the FOMO is real😵‍💫 Luckily it's all recorded, so I've been catching up on what I missed. Thread below with some personal highlights🧵

After 6+ months in the making and over a year of GPU compute, we're excited to release the "Ultra-Scale Playbook": hf.co/spaces/nanot... A book to learn all about 5D parallelism, ZeRO, CUDA kernels, how/why overlap compute & coms with theory, motivation, interactive plots and 4000+ experiments!

Variational Flow Matching goes Riemannian! 🔮 In this preliminary work, we derive a variational objective for probability flows 🌀 on manifolds with closed-form geodesics, and discuss some interesting results. Dream team: Floor, Alison & Erik (their @ below) 💥 📜 arxiv.org/abs/2502.12981 🧵1/5

What are the grand challenges in Bayesian computation? statmodeling.stat.columbia.edu/2025/02/19/w...

LEAPS is a groundbreaking algorithm for efficient sampling from discrete distributions via deep learning. Using continuous-time Markov chains, it minimizes variance in importance weights and shows promise for applications in statistical physics and beyond. https://arxiv.org/abs/2502.10843

James Thornton, Louis Bethune, Ruixiang Zhang, Arwen Bradley, Preetum Nakkiran, Shuangfei Zhai Composition and Control with Distilled Energy Diffusion Models and Sequential Monte Carlo https://arxiv.org/abs/2502.12786

Boris Meinardus: How I'd learn ML in 2025 (if I could start over) www.youtube.com/watch?v=_xIw.... (me too 😄)

These blogs for RBC Borealis consider infinite-width neural networks from 4 viewpoints. We use gradient descent or a Bayesian approach, and, for each, we focus on either the weights or output function. This leads to the Neural Tangent Kernel, Bayesian NNs and NNGPs. Enjoy! tinyurl.com/yfsts565

Here's the 2nd part of my series on ODEs and SDEs in ML. This article introduces ODEs and is suitable for novices: rbcborealis.com/research-blo... We describe ODEs, vector ODEs and PDEs and categorize ODEs by how their solutions are related. We describe conditions for an ODE to have a solution.

A recording of my talk from this afternoon: youtu.be/jSeXZ6IjKn8?...

bc i haven't done so yet, i decided to burn any remaining bridge to the land of statistics. it wasn't statisticians nor statistics but it was me. i am simply not good enough to do statistics myself. so, @peyrardmax.bsky.social and i decided to turn statistical estimation into supervised learning.

Zhicong Tang, Jianmin Bao, Dong Chen, Baining Guo Diffusion Models without Classifier-free Guidance https://arxiv.org/abs/2502.12154

Clarke derivative generalizes the notion of convex subdifferential to nonconvex locally Lipschitz functions — thanks to Rademacher’s theorem — as the convex hull of the limiting sequences of gradient converging. https://ams.org/journals/tran/1975-205-00/S0002-9947-1975-0367131-6/home.html

One of the most interesting new research in the LLM space: introducing Large Language Diffusion Models. arxiv.org/pdf/2502.09992

Interesting work from folks at Apple. I like the catch-all term "rectified flow matching." arxiv.org/abs/2502.09616

This blog post by @drscotthawley.bsky.social provides a very accessible overview of flow matching / rectified flow and reflow, based on intuitions from physics, rather than starting from probability distributions. The visualisations and animations are excellent, and the whole thing is also a colab!

Ahmed Imtiaz Humayun, Randall Balestriero, Guha Balakrishnan, Richard Baraniuk SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries https://arxiv.org/abs/2302.12828

Finite mixture models are useful when data comes from multiple latent processes. BayesFlow allows: • Approximating the joint posterior of model parameters and mixture indicators • Inferences for independent and dependent mixtures • Amortization for fast and accurate estimation 📄 Preprint 💻 Code

Announcing Diffusion Forcing Transformer (DFoT), our new video diffusion algorithm that generates ultra-long videos of 800+ frames. DFoT enables History Guidance, a simple add-on to any existing video diffusion models for a quality boost. Website: boyuan.space/history-guidance (1/7)

I'm very excited to share notes on Probabilistic AI that I have been writing with @arkrause.bsky.social 🥳 arxiv.org/pdf/2502.05244 These notes aim to give a graduate-level introduction to probabilistic ML + sequential decision-making. I'm super glad to be able to share them with all of you now!

Great interview with @jascha.sohldickstein.com about diffusion models! This is the first in a series: similar interviews with Yang Song and yours truly will follow soon. (One of these is not like the others -- both of them basically invented the field, and I occasionally write a blog post 🥲)

Very nicely produced video on diffusion models. youtu.be/1pgiu--4W3I

@cvprconference.bsky.social go.bsky.app/45EuhSi

finally managed to sneak my dog into a paper: arxiv.org/abs/2502.04549

How to accelerate the inference of discrete diffusion model / improve its accuracy? Please check out arxiv.org/abs/2502.00234 . Grateful to have worked with a luxurious team of Yinuo Ren, Haoxuan Chen, Yuchen Zhu, Wei Guo, Yongxin Chen, @grant.rotskoff.cc and @lexingying.bsky.social !

Really excited about this! We note a connection between diffusion/flow models and neural/latent SDEs. We show how to use this for simulation-free learning of fully flexible SDEs. We refer to this as SDE Matching and show speed improvements of several orders of magnitude. arxiv.org/abs/2502.02472

Better diffusions with scoring rules! Fewer, larger denoising steps using distributional losses; learn the posterior distribution of clean samples given the noisy versions. arxiv.org/pdf/2502.02483 @vdebortoli.bsky.social Galashov Guntupalli Zhou @sirbayes.bsky.social @arnauddoucet.bsky.social

Does anyone have pointers to tutorials on double descent and what it tells us about "overfitting?"

Keen to digest this one: arxiv.org/abs/2502.01353 'A coupling approach to Lipschitz transport maps' - Giovanni Conforti, Katharina Eichinger

Optimal Transport for Domain Adaptation through Gaussian Mixture Models Eduardo Fernandes Montesuma, Fred Maurice NGOLE MBOULA, Antoine Souloumiac Action editor: Vincent Dumoulin https://openreview.net/forum?id=DCAeXwLenB #adaptation #transport #mixture

An analysis of the noise schedule for score-based generative models Stanislas Strasman, Antonio Ocello, Claire Boyer, Sylvain Le Corff, Vincent Lemaire Action editor: Bruno Loureiro https://openreview.net/forum?id=BlYIPa0Fx1 #generative #wasserstein #hyperparameters

Since everyone wants to learn RL for language models now post DeepSeek, reminder that I've been working on this book quietly in the background for months. Policy gradient chapter is coming together. Plugging away at the book every day now. rlhfbook.com/c/11-policy-...