vinhtong.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

Many thanks to my collaborators Dung Hoang, @anjiliu.bsky.social, @guyvdb.bsky.social, and @mniepert.bsky.social.

submitted 15 days ago

comment in response to post

[10/n] Paper: openreview.net/forum?id=xDr... Code: github.com/vinhsuhi/LD3...

submitted 15 days ago

comment in response to post

[9/n] Beyond Image Generation LD3 can be applied to diffusion models in other domains, such as molecular docking.

submitted 15 days ago

comment in response to post

[8/n] LD3 is fast LD3 can be trained on a single GPU in under one hour. For smaller datasets like CIFAR-10, training can be completed in less than 6 minutes.

submitted 15 days ago

comment in response to post

[7/n] LD3 significantly improves sample quality.

submitted 15 days ago

comment in response to post

[6/n] This surrogate loss is theoretically close to the original distillation objective, leading to better convergence and avoiding underfitting.

submitted 15 days ago

comment in response to post

[5/n] Soft constraint A potential problem with the student model is its limited capacity. To address this, we propose a soft surrogate loss, simplifying the student's optimization task.

submitted 15 days ago

comment in response to post

[4/n] How? LD3 uses a teacher-student framework: 🔹Teacher: Runs the ODE solver with small step sizes. 🔹Student: Learns optimal discretization to match the teacher's output. 🔹Backpropagates through the ODE solver to refine time steps.

submitted 15 days ago

comment in response to post

[3/n] Key idea LD3 optimizes the time discretization for diffusion ODE solvers by minimizing the global truncation error, resulting in higher sample quality with fewer sampling steps.

submitted 15 days ago

comment in response to post

[2/n] Diffusion models produce high-quality generations but are computationally expensive due to multi-step sampling. Existing acceleration methods either require costly retraining (distillation) or depend on manually designed time discretization heuristics. LD3 changes that.

submitted 15 days ago