huytransformer.com - Profile | ThreadSky | a Reddit-style client for Bluesky

Partially Frozen Random Networks Contain Compact Strong Lottery Tickets Hikari Otsuka, Daiki Chijiwa, Ángel López García-Arias et al. Action editor: Zhangyang Wang https://openreview.net/forum?id=xpnPYfufhz #imagenet #subnetworks #sparser

submitted 3 days ago • 0 comments

Sangwoong Yoon, Himchan Hwang, Hyeokju Jeong, Dong Kyu Shin, Che-Sang Park, Sehee Kwon, Frank Chongwoo Park: Value Gradient Sampler: Sampling as Sequential Decision Making https://arxiv.org/abs/2502.13280 https://arxiv.org/pdf/2502.13280 https://arxiv.org/html/2502.13280

submitted 9 days ago • 1 comment

The Hausdorff distance quantifies how far two sets are from each other by measuring the largest minimal distance needed to cover one with the other. It captures worst-case deviation: if one set has points far from the other, the distance is large.

submitted 4 days ago • 1 comment

Signed Distance Functions/Fields define shapes implicitly by measuring the shortest distance from any point to the surface. Negative values are inside, and positive values are outside. SDF allow smooth blending, Boolean ops for procedural modeling. https://iquilezles.org/articles/distfunctions/

submitted 5 days ago • 2 comments

Excited about our progress in characterizing The Computational Advantage of Depth in Learning with Neural Networks. Check out the number of samples that can be saved when GD runs on a multi-layer rather than on a two-layer neural network. arxiv.org/pdf/2502.13961

submitted 6 days ago • 1 comment

All the good stuff from decades ago is now coming back and making its way into modern deep architectures: PCA, wavelets, Fourier transform ... Now waiting for the Hough transform and Zernike moments 😀

submitted 9 days ago • 2 comments

I'm planning to resume my habit of periodically publishing old blog notes which I've written up (many of which originate in old tweet threads and such), even if they're not especially polished. One which I still find pretty cute: 'Gaussian Smoothing' hackmd.io/@sp-monte-ca...

submitted 6 days ago • 1 comment

Ivan Skorokhodov, Sharath Girish, Benran Hu, Willi Menapace, Yanyu Li, Rameen Abdal, Sergey Tulyakov, Aliaksandr Siarohin Improving the Diffusability of Autoencoders https://arxiv.org/abs/2502.14831

submitted 8 days ago • 0 comments

Leo Zhang, Peter Potaptchik, Arnaud Doucet, Hai-Dang Dau, Saifuddin Syed Generalised Parallel Tempering: Flexible Replica Exchange via Flows and Diffusions https://arxiv.org/abs/2502.10328

submitted 12 days ago • 0 comments

arxiv.org/abs/2006.10739

submitted 8 days ago • 0 comments

📣 New preprint 📣 Learning Theory for Kernel Bilevel Optimization w/ @fareselkhoury.bsky.social E. Pauwels @michael-arbel.bsky.social We provide generalization error bounds for bilevel optimization problems where the inner objective is minimized over a RKHS. arxiv.org/abs/2502.08457

submitted 8 days ago • 1 comment

Are you still using LoRA to fine-tune your LLM? 2024 has seen an explosion of new parameter-efficient fine tuning technique (PEFT), thanks to clever uses of the singular value decomposition (SVD). Let's dive into the alphabet soup: SVF, SVFT, MiLoRA, PiSSA, LoRA-XS 🤯...

submitted 8 days ago • 1 comment

Implemented the Principle of Stationary Action (for a minimum) using Apple's MLX autograd, bridging variational classical mechanics and modern ML gradients. The animation shows the convergence to the true equation of motion by minimizing the action integral. Made with #python #mlx #matplotlib

submitted 8 days ago • 2 comments

The GAN is dead; long live the GAN! A Modern Baseline GAN This is a very interesting paper, exploring making GANs simpler and more performant. abs: arxiv.org/abs/2501.05441 code: github.com/brownvc/R3GAN

submitted 49 days ago • 0 comments

📢PSA: #NeurIPS2024 recordings are now publicly available! The workshops always have tons of interesting things on at once, so the FOMO is real😵‍💫 Luckily it's all recorded, so I've been catching up on what I missed. Thread below with some personal highlights🧵

submitted 37 days ago • 1 comment

After 6+ months in the making and over a year of GPU compute, we're excited to release the "Ultra-Scale Playbook": hf.co/spaces/nanot... A book to learn all about 5D parallelism, ZeRO, CUDA kernels, how/why overlap compute & coms with theory, motivation, interactive plots and 4000+ experiments!

submitted 9 days ago • 3 comments

Variational Flow Matching goes Riemannian! 🔮 In this preliminary work, we derive a variational objective for probability flows 🌀 on manifolds with closed-form geodesics, and discuss some interesting results. Dream team: Floor, Alison & Erik (their @ below) 💥 📜 arxiv.org/abs/2502.12981 🧵1/5

submitted 9 days ago • 1 comment

What are the grand challenges in Bayesian computation? statmodeling.stat.columbia.edu/2025/02/19/w...

submitted 9 days ago • 0 comments

LEAPS is a groundbreaking algorithm for efficient sampling from discrete distributions via deep learning. Using continuous-time Markov chains, it minimizes variance in importance weights and shows promise for applications in statistical physics and beyond. https://arxiv.org/abs/2502.10843

submitted 10 days ago • 0 comments

James Thornton, Louis Bethune, Ruixiang Zhang, Arwen Bradley, Preetum Nakkiran, Shuangfei Zhai Composition and Control with Distilled Energy Diffusion Models and Sequential Monte Carlo https://arxiv.org/abs/2502.12786

submitted 10 days ago • 0 comments

Boris Meinardus: How I'd learn ML in 2025 (if I could start over) www.youtube.com/watch?v=_xIw.... (me too 😄)

submitted 54 days ago • 0 comments

These blogs for RBC Borealis consider infinite-width neural networks from 4 viewpoints. We use gradient descent or a Bayesian approach, and, for each, we focus on either the weights or output function. This leads to the Neural Tangent Kernel, Bayesian NNs and NNGPs. Enjoy! tinyurl.com/yfsts565

submitted 25 days ago • 0 comments

Here's the 2nd part of my series on ODEs and SDEs in ML. This article introduces ODEs and is suitable for novices: rbcborealis.com/research-blo... We describe ODEs, vector ODEs and PDEs and categorize ODEs by how their solutions are related. We describe conditions for an ODE to have a solution.

submitted 10 days ago • 0 comments

A recording of my talk from this afternoon: youtu.be/jSeXZ6IjKn8?...

submitted 10 days ago • 1 comment

bc i haven't done so yet, i decided to burn any remaining bridge to the land of statistics. it wasn't statisticians nor statistics but it was me. i am simply not good enough to do statistics myself. so, @peyrardmax.bsky.social and i decided to turn statistical estimation into supervised learning.

submitted 10 days ago • 3 comments

Zhicong Tang, Jianmin Bao, Dong Chen, Baining Guo Diffusion Models without Classifier-free Guidance https://arxiv.org/abs/2502.12154

submitted 11 days ago • 0 comments

Clarke derivative generalizes the notion of convex subdifferential to nonconvex locally Lipschitz functions — thanks to Rademacher’s theorem — as the convex hull of the limiting sequences of gradient converging. https://ams.org/journals/tran/1975-205-00/S0002-9947-1975-0367131-6/home.html

submitted 11 days ago • 1 comment

One of the most interesting new research in the LLM space: introducing Large Language Diffusion Models. arxiv.org/pdf/2502.09992

submitted 11 days ago • 5 comments

Interesting work from folks at Apple. I like the catch-all term "rectified flow matching." arxiv.org/abs/2502.09616

submitted 14 days ago • 0 comments

This blog post by @drscotthawley.bsky.social provides a very accessible overview of flow matching / rectified flow and reflow, based on intuitions from physics, rather than starting from probability distributions. The visualisations and animations are excellent, and the whole thing is also a colab!

submitted 105 days ago • 1 comment

Ahmed Imtiaz Humayun, Randall Balestriero, Guha Balakrishnan, Richard Baraniuk SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries https://arxiv.org/abs/2302.12828

submitted 263 days ago • 0 comments

Finite mixture models are useful when data comes from multiple latent processes. BayesFlow allows: • Approximating the joint posterior of model parameters and mixture indicators • Inferences for independent and dependent mixtures • Amortization for fast and accurate estimation 📄 Preprint 💻 Code

submitted 17 days ago • 0 comments

Announcing Diffusion Forcing Transformer (DFoT), our new video diffusion algorithm that generates ultra-long videos of 800+ frames. DFoT enables History Guidance, a simple add-on to any existing video diffusion models for a quality boost. Website: boyuan.space/history-guidance (1/7)

submitted 17 days ago • 1 comment

I'm very excited to share notes on Probabilistic AI that I have been writing with @arkrause.bsky.social 🥳 arxiv.org/pdf/2502.05244 These notes aim to give a graduate-level introduction to probabilistic ML + sequential decision-making. I'm super glad to be able to share them with all of you now!

submitted 17 days ago • 3 comments

Great interview with @jascha.sohldickstein.com about diffusion models! This is the first in a series: similar interviews with Yang Song and yours truly will follow soon. (One of these is not like the others -- both of them basically invented the field, and I occasionally write a blog post 🥲)

submitted 18 days ago • 0 comments

Very nicely produced video on diffusion models. youtu.be/1pgiu--4W3I

submitted 18 days ago • 0 comments

@cvprconference.bsky.social go.bsky.app/45EuhSi

submitted 18 days ago • 0 comments

finally managed to sneak my dog into a paper: arxiv.org/abs/2502.04549

submitted 19 days ago • 1 comment

How to accelerate the inference of discrete diffusion model / improve its accuracy? Please check out arxiv.org/abs/2502.00234 . Grateful to have worked with a luxurious team of Yinuo Ren, Haoxuan Chen, Yuchen Zhu, Wei Guo, Yongxin Chen, @grant.rotskoff.cc and @lexingying.bsky.social !

submitted 21 days ago • 0 comments

Really excited about this! We note a connection between diffusion/flow models and neural/latent SDEs. We show how to use this for simulation-free learning of fully flexible SDEs. We refer to this as SDE Matching and show speed improvements of several orders of magnitude. arxiv.org/abs/2502.02472

submitted 23 days ago • 0 comments

Better diffusions with scoring rules! Fewer, larger denoising steps using distributional losses; learn the posterior distribution of clean samples given the noisy versions. arxiv.org/pdf/2502.02483 @vdebortoli.bsky.social Galashov Guntupalli Zhou @sirbayes.bsky.social @arnauddoucet.bsky.social

submitted 23 days ago • 1 comment

Does anyone have pointers to tutorials on double descent and what it tells us about "overfitting?"

submitted 25 days ago • 3 comments

Keen to digest this one: arxiv.org/abs/2502.01353 'A coupling approach to Lipschitz transport maps' - Giovanni Conforti, Katharina Eichinger

submitted 24 days ago • 0 comments

Optimal Transport for Domain Adaptation through Gaussian Mixture Models Eduardo Fernandes Montesuma, Fred Maurice NGOLE MBOULA, Antoine Souloumiac Action editor: Vincent Dumoulin https://openreview.net/forum?id=DCAeXwLenB #adaptation #transport #mixture

submitted 36 days ago • 0 comments

An analysis of the noise schedule for score-based generative models Stanislas Strasman, Antonio Ocello, Claire Boyer, Sylvain Le Corff, Vincent Lemaire Action editor: Bruno Loureiro https://openreview.net/forum?id=BlYIPa0Fx1 #generative #wasserstein #hyperparameters

submitted 27 days ago • 0 comments

Since everyone wants to learn RL for language models now post DeepSeek, reminder that I've been working on this book quietly in the background for months. Policy gradient chapter is coming together. Plugging away at the book every day now. rlhfbook.com/c/11-policy-...

submitted 27 days ago • 2 comments