ilyassmoummad.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

Long-audio understanding: Audio Flamingo 2 (AF2), using a custom CLAP model, synthetic data, and multi-stage curriculum learning, achieved state-of-the-art performance on over 20 benchmarks, including a new long-audio dataset (LongAudio).

submitted 2 days ago • 0 comments

Interesting talk by Yi Ma about the nature of intelligence, what have we done so far in AI, and what to do next: scds1001.dirk.hk/L-2.html

submitted 10 days ago • 0 comments

🎉 Celebrating 100,000 Modeled Taxa in the iNaturalist Open Range Map Dataset! To mark this milestone, we're making model-generated distribution data even more accessible. Explore, analyze, and use this data to power biodiversity research! 🌍🔍 www.inaturalist.org/posts/106918

submitted 11 days ago • 2 comments

Kernel Audio Distance (KAD), a new audio generation evaluation metric, was proposed, showing faster convergence, lower computational cost, and better alignment with human perception than Fréchet Audio Distance (FAD). It leverages MMD and advanced embeddings. GPU acceleration was used.

submitted 13 days ago • 0 comments

Are you still using LoRA to fine-tune your LLM? 2024 has seen an explosion of new parameter-efficient fine tuning technique (PEFT), thanks to clever uses of the singular value decomposition (SVD). Let's dive into the alphabet soup: SVF, SVFT, MiLoRA, PiSSA, LoRA-XS 🤯...

submitted 17 days ago • 2 comments

Launching: BioDCASE - the Bioacoustics Data Challenge! https://biodcase.github.io/ #DCASE #DCASE2025 #DCLDE #bioacoustics #ai4good

submitted 23 days ago • 0 comments

Postdoc job at Naturalis: "Postdoctoral Fellow in Machine Learning & Butterfly Ecology" https://www.naturalis.nl/en/about-us/job-opportunities/postdoctoral-fellow-in-machine-learning-butterfly-ecology (Beautiful museum, great work environment, plus The Netherlands :) #academicjobs #postdoc

submitted 18 days ago • 0 comments

Yi Ma & colleagues managed to simplify DINO & DINOv2 by removing many ingredients and adding a robust regularization term from information theory (coding rate) that learn informative decorrelated features. Happy to see principled approaches advance deep representation learning!

submitted 19 days ago • 1 comment

Want strong SSL, but not the complexity of DINOv2? CAPI: Cluster and Predict Latents Patches for Improved Masked Image Modeling.

submitted 23 days ago • 1 comment

🚀 "UNSURE: self-supervised learning with Unknown Noise level and Stein's Unbiased Risk Estimate" is accepted at #ICLR2025 A thread! 📜 Paper: arxiv.org/abs/2409.01985 🖥️ Code: github.com/tachella/uns...

submitted 38 days ago • 1 comment

"Compositional Entailment Learning for Hyperbolic Vision-Language Models" #ICLR25 With: Avik Pal, Max van Spengler, Guido Maria D'Amely di Melendugno, Alessandro Flaborea, and @pascalmettes.bsky.social Paper: arxiv.org/abs/2410.06912

submitted 43 days ago • 0 comments

Short introduction to optimal transport with a simple 2D discrete example. The video was done in Manim a few years ago and I have sadly lost the original code. youtu.be/Os1xkUlwjjo

submitted 48 days ago • 3 comments

📢 The short description of the tasks is now available on the website 👇 dcase.community/challenge2025/

submitted 48 days ago • 0 comments

Great to see how all the CV pioneers have thought about various CV problems back then and how 20 years of research have changed the view on most of these problems. There is still much left to do. It would be great to repeat this series to look back to in 20 years from today.

submitted 73 days ago • 0 comments

The Nadaraya-Watson estimator is linear local averaging estimator relying on a pointwise nonnegative kernel. Most of the time, a box or Gaussian kernel is used. https://www.jstor.org/stable/25049340?seq=2

submitted 61 days ago • 1 comment

When I was a kid I was fascinated by SETI, the Search for Extraterrestrial Intellitence. Now we live in an era when it is becoming meaningful to search for "extraterrestrial life" not just in our universe but in simulated universes as well. This project provides new tools toward that dream:

submitted 75 days ago • 3 comments

Schrödinger Bridge Flow for Unpaired Data Translation (by @vdebortoli.bsky.social et al.) It will take me some time to digest this article fully, but it's important to follow the authors' advice and read the appendices, as the examples are helpful and well-illustrated. 📄 arxiv.org/abs/2409.09347

submitted 82 days ago • 0 comments

My book is (at last) out, just in time for Christmas! A blog post to celebrate and present it: francisbach.com/my-book-is-o...

submitted 78 days ago • 2 comments

General structure of a paper: - general ideas - general case - general case - general case - what we actually do how it should be: - what we actually do - why we think it's great as one method of a general class - how we got there - how we got there - how we got there

submitted 82 days ago • 6 comments

Hyperbolic learning is growing rapidly by the day. From weekly alerts in 2023 to daily digests in 2024! From our current research, it is clear that 2025 will be a huge year for hyperbolic learning research. I had an interview to elaborate our research: shorturl.at/CQD53

submitted 83 days ago • 0 comments

Brilliant talk by Ilya, but he's wrong on one point. We are NOT running out of data. We are running out of human-written text. We have more videos than we know what to do with. We just haven't solved pre-training in vision. Just go out and sense the world. Data is easy.

submitted 85 days ago • 4 comments

🚀 Introducing the Byte Latent Transformer (BLT) – A LLM architecture that scales better than Llama 3 using patches instead of tokens 🤯 Paper 📄 dl.fbaipublicfiles.com/blt/BLT__Pat... Code 🛠️ github.com/facebookrese...

submitted 86 days ago • 5 comments

We @imagineenpc.bsky.social are slowly but surely entering our proposals for master's degree internships here: docs.google.com/document/d/1... These are 6 months projects that typically correspond to the end-of-study project in the French curriculum. Probably more offers to come, check it regularly.

submitted 87 days ago • 2 comments

I'm pleased to share that our recent paper with @2ptmvd has been accepted to the Philoshophical Transactions of the Royal Society. Here's the ‘Accepted Author Version’: drive.google.com/file/d/1jdtr... And here it is on arxiv without the fancy formatting: arxiv.org/abs/2409.06219 1/3

submitted 88 days ago • 1 comment

I started to put together a starter pack for research in AI+Ecology, check it out and let me know if you would like to be added! go.bsky.app/8zugFF6

submitted 95 days ago • 31 comments

How do language models organize concepts and their properties? Do they use taxonomies to infer new properties, or infer based on concept similarities? Apparently, both! 🌟 New paper with my fantastic collaborators @amuuueller.bsky.social and @kanishka.bsky.social

submitted 121 days ago • 4 comments

🎯 How can we empower scientific discovery in millions of nature photos? Introducing INQUIRE: A benchmark testing if AI vision-language models can help scientists find biodiversity patterns- from disease symptoms to rare behaviors- hidden in vast image collections. Thread👇🧵

submitted 93 days ago • 3 comments

The origins of "attention", which @karpathy.bsky.social correctly calls a "brilliant (data-dependent) weighted average operation", were not in machine learning - in fact this idea dates back to data-dependent "filters" in image processing from the 90s. 1/n

submitted 93 days ago • 2 comments

Hellinger and Wasserstein are the two main geodesic distances on probability distributions. While both minimize the same energy, they differ in their interpolation methods: Hellinger focuses on density, whereas Wasserstein emphasizes position displacements.

submitted 96 days ago • 1 comment

📢 Exciting news! My PhD defense titled "Invariant Representation Learning for Few-Shot Bioacoustic Event Detection and Classification" is happening this Monday, December 2nd, at 9 AM (CET). It'll be livestreamed on YT! 🎥 If you're interested, drop me a message for the link.

submitted 100 days ago • 3 comments

🤔 Can you turn your vision-language model from a great zero-shot model into a great-at-any-shot generalist? Turns out you can, and here is how: arxiv.org/abs/2411.15099 Really excited to this work on multimodal pretraining for my first bluesky entry! 🧵 A short and hopefully informative thread:

submitted 101 days ago • 2 comments

A really cool paper from Kyutai demonstrates how model capabilities can be extended to a new domain (e.g., learning a new language) while preserving the original capabilities. This is achieved by leveraging the concept of adapters.

submitted 102 days ago • 0 comments

NeurIPS Test of Time Awards: Generative Adversarial Nets Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio Sequence to Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le

submitted 102 days ago • 6 comments

So awesome to see the evolution of SFX generation from the Adobe titans!

submitted 102 days ago • 0 comments

Shannon's entropy measures the uncertainty or information content in a probability distribution. It's a concept in data compression and communication introduced in the paper “A Mathematical Theory of Communication”. https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf

submitted 102 days ago • 1 comment