Profile avatar
camomille5000.bsky.social
I like computer vision 2d, 3d and machine learning.
1 posts 38 followers 475 following
Prolific Poster

The Mathematics of Artificial Intelligence: In this introductory and highly subjective survey, aimed at a general mathematical audience, I showcase some key theoretical concepts underlying recent advancements in machine learning. arxiv.org/abs/2501.10465

hahahahah there were actually two technical reports for RL reasoning models today, kimi 1.5 also has good stuff on reward shaping + RL infra kimi 1.5 report: https://buff.ly/4jqgCOa

Slides for a general introduction to the use of Optimal Transport methods in learning, with an emphasis on diffusion models, flow matching, training 2 layers neural networks and deep transformers. speakerdeck.com/gpeyre/optim...

Another nail in the coffin of cosine similarity! I started disliking cossim some years ago due to multiple reasons such as the non-linearity around 0.0 and the loss of certainty-information due to the normalization of feature vectors but this study seems to give another good reason to abandon it.

MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training The authors propose a pre-training framework using synthetic cross-modal data to enhance LoFTR and RoMa for matching across medical imaging modalities like CT, MR, PET, and SPECT. zju3dv.github.io/MatchAnything/

The GAN is dead; long live the GAN! A Modern Baseline GAN This is a very interesting paper, exploring making GANs simpler and more performant. abs: arxiv.org/abs/2501.05441 code: github.com/brownvc/R3GAN

Image matching and ChatGPT - new post in the wide baseline stereo blog. tl;dr: it is good, even feels like human, but not perfect. ducha-aiki.github.io/wide-baselin...

Introducing ASAL: Automating the Search for Artificial Life with Foundation Models Blog: sakana.ai/asal/ We propose a new method called Automated Search for Artificial Life (ASAL) which uses foundation models to automate the discovery of the most interesting and open-ended artificial lifeforms!

My book is (at last) out, just in time for Christmas! A blog post to celebrate and present it: francisbach.com/my-book-is-o...

This paper looks interesting - it argues that you don’t need adaptive systems like Adam to get good gradient-based training, instead you can just set a learning rate for different groups of units based on initialization: arxiv.org/abs/2412.11768 #MLSky #NeuroAI

Nouvelle vidéo ! Je reviens sur deux articles paru récemment au sujet des capacités des LLM à mentir et manipuler. Au-delà des annonces spectaculaire du type : "o1 a réussi à s'échapper !!!", que disent vraiment ces articles ? Eh bien nous allons voir. (lien dans la réponse)

Excited to announce ScanNet++ v2!🎉 @awhiteguitar.bsky.social & Yueh-Cheng Liu have been working tirelessly to bring: 🔹1006 high-fidelity 3D scans 🔹+ DSLR & iPhone captures 🔹+ rich semantics Elevating 3D scene understanding to the next level!🚀 w/ @niessner.bsky.social kaldir.vc.in.tum.de/scannetpp

I'll get straight to the point. We trained 2 new models. Like BERT, but modern. ModernBERT. Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff. It's much faster, more accurate, longer context, and more useful. 🧵

Libraries and tools that every deep learning project should use: loguru, tqdm, torchmetrics, einops, python 3.11, black. Optional: prettytable. Good for debugging: lovely_tensors. Any other ones I've missed? Below a few words on each of them:

(1/2) 📢📢𝐆𝐀𝐅: 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 𝐀𝐯𝐚𝐭𝐚𝐫 𝐑𝐞𝐜𝐨𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 𝐟𝐫𝐨𝐦 𝐌𝐨𝐧𝐨𝐜𝐮𝐥𝐚𝐫 𝐕𝐢𝐝𝐞𝐨𝐬 𝐯𝐢𝐚 𝐌𝐮𝐥𝐭𝐢-𝐯𝐢𝐞𝐰 𝐃𝐢𝐟𝐟𝐮𝐬𝐢𝐨𝐧📢📢 We reconstruct animatable Gaussian head avatars from monocular videos captured by commodity devices such as smartphones.

Schrödinger Bridge Flow for Unpaired Data Translation (by @vdebortoli.bsky.social et al.) It will take me some time to digest this article fully, but it's important to follow the authors' advice and read the appendices, as the examples are helpful and well-illustrated. 📄 arxiv.org/abs/2409.09347

The code for Simplified and Generalized Masked Diffusion for Discrete Data (Jiaxin Shi et al) has been released and a lecture by @arnauddoucet.bsky.social on this topic is also available! 🐍 Code: github.com/google-deepm... 📄 Article: arxiv.org/abs/2406.04329 📼 Video: www.youtube.com/watch?v=qj9B...

3D content creation with touch! We exploit tactile sensing to enhance geometric details for text- and image-to-3D generation. Check out our #NeurIPS2024 work on Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation: ruihangao.github.io/TactileDream... 1/3

Flow Matching Guide and Code arxiv.org/abs/2412.06264

(1/n) My favorite "optimizer" work of 2024: 📢 Introducing APOLLO! 🚀: SGD-like memory cost, yet AdamW-level performance (or better!). ❓ How much memory do we need for optimization states in LLM training ? 🧐 Almost zero. 📜 Paper: arxiv.org/abs/2412.05270 🔗 GitHub: github.com/zhuhanqing/A...

Normalizing Flows are Capable Generative Models Apple introduces TarFlow, a new Transformer-based variant of Masked Autoregressive Flows. SOTA on likelihood estimation for images, quality and diversity comparable to diffusion models. arxiv.org/abs/2412.06329

📢 𝐏𝐫𝐄𝐝𝐢𝐭𝐨𝐫𝟑𝐃: 𝐅𝐚𝐬𝐭 𝐚𝐧𝐝 𝐏𝐫𝐞𝐜𝐢𝐬𝐞 𝟑𝐃 𝐒𝐡𝐚𝐩𝐞 𝐄𝐝𝐢𝐭𝐢𝐧𝐠 📢 We propose a training-free 3D shape editing approach that rapidly and precisely edits the regions intended by the user and keeps the rest as is.

Inventors of flow matching have released a comprehensive guide going over the math & code of flow matching! Also covers variants like non-Euclidean & discrete flow matching. A PyTorch library is also released with this guide! This looks like a very good read! 🔥 arxiv: arxiv.org/abs/2412.06264

What better time to announce a new paper than during NeurIPS and ACCV? happy happy happy to introduce NADA, our latest work on object detection in art! 🎨 with amazing collaborators: @patrick-ramos.bsky.social, @nicaogr.bsky.social, Selina Khan, Yuta Nakashima

New paper alert! Zero-Shot Coreset Selection: Efficient Pruning for Unlabeled Data Training models requires massive amounts of labeled data. ZCore shows you that you need less labeled data to train good models. Paper Link: arxiv.org/abs/2411.15349 GitHub Repo: github.com/voxel51/zcore

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks abs: arxiv.org/abs/2412.04626 project page: bigdocs.github.io BigDocs-7.5M is a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks.

Publication-ready visualization of 3D objects and point clouds in seconds, using @blender.org and BlenderProc. hummat.github.io/bproc-pubvis/

Airborne #LiDAR has revolutionized the study of ancient rainforest civilizations by seeing through dense canopies. Yet archaeologists still annotate their data manually. Introducing Archaeoscape at #NeurIPS2024 —the first deep learning-scale, open-access archaeological dataset🧵👇

RoPE has been the one 💯 genuine upgrade to the vanilla Vaswani transformer. This beautiful blogpost by Chris Fleetwood explains the significance and how rotations of Q & K preserves meaning(magnitude) while encodes relative positions(angle shift) 🔥🔥

A common question nowadays: Which is better, diffusion or flow matching? 🤔 Our answer: They’re two sides of the same coin. We wrote a blog post to show how diffusion models and Gaussian flow matching are equivalent. That’s great: It means you can use them interchangeably.

Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)? We have been pondering this during summer and developed a new model: JetFormer 🌊🤖 arxiv.org/abs/2411.19722 A thread 👇 1/

A paper a day, episode 15. You liked the matrix cookbook? You’re gonna love this one. 100 statistics inequalities just for your personal enjoyment. As they say in French, moi j’ai Bienaymé cet article ! arxiv.org/abs/2102.07234

My deep learning course at the University of Geneva is available on-line. 1000+ slides, ~20h of screen-casts. Full of examples in PyTorch. fleuret.org/dlc/ And my "Little Book of Deep Learning" is available as a phone-formatted pdf (nearing 700k downloads!) fleuret.org/lbdl/