Profile avatar
michelolzam.bsky.social
🎧 Machine Listening Researcher
7 posts 233 followers 1,019 following
Regular Contributor

📢 The short description of the tasks is now available on the website 👇 dcase.community/challenge2025/

Transformers Laid Out by Pramod Goyal - Give an intuition of how transformers work - Explain what each section of the paper means and how you can understand and implement it - Code it down using PyTorch from a beginners perspective goyalpramod.github.io/blogs/Transf...

If you're at #NeurIPS2024, join @hugomlrd.bsky.social to learn how to bridge the audio-visual modality gap and give your vision-language model the power to hear! 🤖👂 NeurIPS link: neurips.cc/virtual/2024... Paper: arxiv.org/pdf/2410.05997 🧪📍Poster #3602 (East Hall A-C)

new paper! 🗣️Sketch2Sound💥 Sketch2Sound can create sounds from sonic imitations (i.e., a vocal imitation or a reference sound) via interpretable, time-varying control signals. paper: arxiv.org/abs/2412.08550 web: hugofloresgarcia.art/sketch2sound

The tasks for DCASE challenge 2025 have been announced. dcase.community/articles/cha... Stay tuned for more details.

It's possible to do good machine learning research, even without impossibly huge data, without enormous compute clusters, without architecture hacking, and without making unrealistic assumptions of convexity, Gaussianity, etc. Intriguing Properties of Robust Classification arxiv.org/abs/2412.04245

🚨🚨My team @GoogleDeepMind in Tokyo is looking for a talented research scientist to work on audio generative models! 🔊 Please consider applying if you have expertise in the domain or related areas such as multimodal models, video generation 📹, etc. boards.greenhouse.io/deepmind/job...

Graph Transformers (GTs) can handle long-range dependencies and resolve information bottlenecks, but they’re computationally expensive. Our new model, Spexphormer, helps scale them to much larger graphs – check it out at NeurIPS next week, or the preview here! [1/13] #NeurIPS2024

TACO, a training-free method using NMF to co-factorize audio and visual features from pre-trained models, achieved state-of-the-art unsupervised sound-prompted segmentation.

Shouldn't be any other way! ☺️

🤔 Can you turn your vision-language model from a great zero-shot model into a great-at-any-shot generalist? Turns out you can, and here is how: arxiv.org/abs/2411.15099 Really excited to this work on multimodal pretraining for my first bluesky entry! 🧵 A short and hopefully informative thread:

I made a starter pack for people working or interested in multi-modality learning. It would be good to add lots more people so do comment and I'll add! go.bsky.app/97fAH2N

Do you know what rating you’ll give after reading the intro? Are your confidence scores 4 or higher? Do you not respond in rebuttal phases? Are you worried how it will look if your rating is the only 8 among 3’s? This thread is for you.

I was deeply disappointed by the lack of nature/science/climate/enviro on many major end-of-year book lists—so I decided to make my own! Introducing: ✨🎁📚 The 2024 Holiday Gift Guide to Nature & Science Books ✨🎁📚 Please share: Let's make this go viral in time for Black Friday / holiday shopping!

La Era de la Inteligencia Artificial a short documentary produced by Telemundo Houston won a Lone Star Emmy in the Science category www.telemundohouston.com/noticias/tec...

We published an extended version of our #ICASSP2023 paper: EPIC-SOUNDS: A Large-scale Dataset of Actions That Sound + sound event detection baseline + detailed annotations pipeline + analysis of visual vs audio events + audio-visual models arxiv.org/abs/2302.006...

Don't let the scores break your spirit! 💪

Outlined an AI research review article for December… I love traveling but I also can’t wait to be back on my computer 😅. In the meantime, if you are curious how Multimodal LLMs work, I recently wrote an article to explain the main & recent approaches: magazine.sebastianraschka.com/p/understand...

What an awesome video about the Schrödinger equation! www.youtube.com/watch?v=uVKM... Young people have no idea how they live in a golden age w.r.t. access to knowledge.

Interested in machine learning in science? Timo and I recently published a book, and even if you are not a scientist, you'll find useful overviews of topics like causality and robustness. The best part is that you can read it for free: ml-science-book.com

We're here too now! 🥳

For those who missed this post on the-network-that-is-not-to-be-named, I made public my "secrets" for writing a good CVPR paper (or any scientific paper). I've compiled these tips of many years. It's long but hopefully it helps people write better papers. perceiving-systems.blog/en/post/writ...

I initiated a starter pack for Audio ML. Let me know if you'd like to be added/removed. go.bsky.app/LGmct4z