michelolzam.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

michelolzam.bsky.social

🎧 Machine Listening Researcher

7 posts 233 followers 1,019 following

Posts 23 Comments 4

📢 The short description of the tasks is now available on the website 👇 dcase.community/challenge2025/

submitted 48 days ago • 0 comments

Transformers Laid Out by Pramod Goyal - Give an intuition of how transformers work - Explain what each section of the paper means and how you can understand and implement it - Code it down using PyTorch from a beginners perspective goyalpramod.github.io/blogs/Transf...

submitted 57 days ago • 0 comments

If you're at #NeurIPS2024, join @hugomlrd.bsky.social to learn how to bridge the audio-visual modality gap and give your vision-language model the power to hear! 🤖👂 NeurIPS link: neurips.cc/virtual/2024... Paper: arxiv.org/pdf/2410.05997 🧪📍Poster #3602 (East Hall A-C)

submitted 86 days ago • 0 comments

new paper! 🗣️Sketch2Sound💥 Sketch2Sound can create sounds from sonic imitations (i.e., a vocal imitation or a reference sound) via interpretable, time-varying control signals. paper: arxiv.org/abs/2412.08550 web: hugofloresgarcia.art/sketch2sound

submitted 87 days ago • 2 comments

The tasks for DCASE challenge 2025 have been announced. dcase.community/articles/cha... Stay tuned for more details.

submitted 89 days ago • 0 comments

It's possible to do good machine learning research, even without impossibly huge data, without enormous compute clusters, without architecture hacking, and without making unrealistic assumptions of convexity, Gaussianity, etc. Intriguing Properties of Robust Classification arxiv.org/abs/2412.04245

submitted 93 days ago • 1 comment

🚨🚨My team @GoogleDeepMind in Tokyo is looking for a talented research scientist to work on audio generative models! 🔊 Please consider applying if you have expertise in the domain or related areas such as multimodal models, video generation 📹, etc. boards.greenhouse.io/deepmind/job...

submitted 93 days ago • 0 comments

Graph Transformers (GTs) can handle long-range dependencies and resolve information bottlenecks, but they’re computationally expensive. Our new model, Spexphormer, helps scale them to much larger graphs – check it out at NeurIPS next week, or the preview here! [1/13] #NeurIPS2024

submitted 94 days ago • 1 comment

TACO, a training-free method using NMF to co-factorize audio and visual features from pre-trained models, achieved state-of-the-art unsupervised sound-prompted segmentation.

submitted 96 days ago • 0 comments

Shouldn't be any other way! ☺️

submitted 99 days ago • 0 comments

🤔 Can you turn your vision-language model from a great zero-shot model into a great-at-any-shot generalist? Turns out you can, and here is how: arxiv.org/abs/2411.15099 Really excited to this work on multimodal pretraining for my first bluesky entry! 🧵 A short and hopefully informative thread:

submitted 101 days ago • 2 comments

I made a starter pack for people working or interested in multi-modality learning. It would be good to add lots more people so do comment and I'll add! go.bsky.app/97fAH2N

submitted 102 days ago • 15 comments

Do you know what rating you’ll give after reading the intro? Are your confidence scores 4 or higher? Do you not respond in rebuttal phases? Are you worried how it will look if your rating is the only 8 among 3’s? This thread is for you.

submitted 102 days ago • 4 comments

I was deeply disappointed by the lack of nature/science/climate/enviro on many major end-of-year book lists—so I decided to make my own! Introducing: ✨🎁📚 The 2024 Holiday Gift Guide to Nature & Science Books ✨🎁📚 Please share: Let's make this go viral in time for Black Friday / holiday shopping!

submitted 102 days ago • 375 comments

La Era de la Inteligencia Artificial a short documentary produced by Telemundo Houston won a Lone Star Emmy in the Science category www.telemundohouston.com/noticias/tec...

submitted 103 days ago • 1 comment

We published an extended version of our #ICASSP2023 paper: EPIC-SOUNDS: A Large-scale Dataset of Actions That Sound + sound event detection baseline + detailed annotations pipeline + analysis of visual vs audio events + audio-visual models arxiv.org/abs/2302.006...

submitted 104 days ago • 1 comment

Don't let the scores break your spirit! 💪

submitted 104 days ago • 0 comments

Outlined an AI research review article for December… I love traveling but I also can’t wait to be back on my computer 😅. In the meantime, if you are curious how Multimodal LLMs work, I recently wrote an article to explain the main & recent approaches: magazine.sebastianraschka.com/p/understand...

submitted 105 days ago • 2 comments

What an awesome video about the Schrödinger equation! www.youtube.com/watch?v=uVKM... Young people have no idea how they live in a golden age w.r.t. access to knowledge.

submitted 106 days ago • 2 comments

Interested in machine learning in science? Timo and I recently published a book, and even if you are not a scientist, you'll find useful overviews of topics like causality and robustness. The best part is that you can read it for free: ml-science-book.com

submitted 114 days ago • 7 comments

We're here too now! 🥳

submitted 107 days ago • 0 comments

For those who missed this post on the-network-that-is-not-to-be-named, I made public my "secrets" for writing a good CVPR paper (or any scientific paper). I've compiled these tips of many years. It's long but hopefully it helps people write better papers. perceiving-systems.blog/en/post/writ...

submitted 109 days ago • 4 comments

I initiated a starter pack for Audio ML. Let me know if you'd like to be added/removed. go.bsky.app/LGmct4z

submitted 111 days ago • 48 comments