We jointly train our model on high-quality text-audio pairs as well as videos, enabling our model to generate full-bandwidth professional audio with fine-grained creative control and synchronization.
Post image

Comments