Profile avatar
vinhtong.bsky.social
11 posts 6 followers 5 following
Conversation Starter
comment in response to post
Many thanks to my collaborators Dung Hoang, @anjiliu.bsky.social, @guyvdb.bsky.social, and @mniepert.bsky.social.
comment in response to post
[10/n] Paper: openreview.net/forum?id=xDr... Code: github.com/vinhsuhi/LD3...
comment in response to post
[9/n] Beyond Image Generation LD3 can be applied to diffusion models in other domains, such as molecular docking.
comment in response to post
[8/n] LD3 is fast LD3 can be trained on a single GPU in under one hour. For smaller datasets like CIFAR-10, training can be completed in less than 6 minutes.
comment in response to post
[7/n] LD3 significantly improves sample quality.
comment in response to post
[6/n] This surrogate loss is theoretically close to the original distillation objective, leading to better convergence and avoiding underfitting.
comment in response to post
[5/n] Soft constraint A potential problem with the student model is its limited capacity. To address this, we propose a soft surrogate loss, simplifying the student's optimization task.
comment in response to post
[4/n] How? LD3 uses a teacher-student framework: 🔹Teacher: Runs the ODE solver with small step sizes. 🔹Student: Learns optimal discretization to match the teacher's output. 🔹Backpropagates through the ODE solver to refine time steps.
comment in response to post
[3/n] Key idea LD3 optimizes the time discretization for diffusion ODE solvers by minimizing the global truncation error, resulting in higher sample quality with fewer sampling steps.
comment in response to post
[2/n] Diffusion models produce high-quality generations but are computationally expensive due to multi-step sampling. Existing acceleration methods either require costly retraining (distillation) or depend on manually designed time discretization heuristics. LD3 changes that.