czyang.bsky.social
Ph.D. Student @ UMich EECS. Multimodal learning, audio-visual learning and computer vision.
Prev research Intern @Adobe and @Meta
https://ificl.github.io/
4 posts
48 followers
22 following
comment in response to
post
This work is done during my internship at Adobe Research. Big thanks to all my collaborators @pseeth.bsky.social, Bryan Russell, @urinieto.bsky.social, David Bourgin, @andrewowens.bsky.social, and @justinsalamon.bsky.social!
comment in response to
post
We jointly train our model on high-quality text-audio pairs as well as videos, enabling our model to generate full-bandwidth professional audio with fine-grained creative control and synchronization.
comment in response to
post
MultiFoley is a unified framework for video-guided audio generation leveraging text, audio, and video conditioning within a single model. As a result, we can do text-guided foley, audio-guided foley (e.g. sync your favorite sample with the video), and foley audio extension.