TACO, a training-free method using NMF to co-factorize audio and visual features from pre-trained models, achieved state-of-the-art unsupervised sound-prompted segmentation.

Comments