camomille5000.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

The Mathematics of Artificial Intelligence: In this introductory and highly subjective survey, aimed at a general mathematical audience, I showcase some key theoretical concepts underlying recent advancements in machine learning. arxiv.org/abs/2501.10465

submitted 37 days ago • 2 comments

hahahahah there were actually two technical reports for RL reasoning models today, kimi 1.5 also has good stuff on reward shaping + RL infra kimi 1.5 report: https://buff.ly/4jqgCOa

submitted 38 days ago • 0 comments

Slides for a general introduction to the use of Optimal Transport methods in learning, with an emphasis on diffusion models, flow matching, training 2 layers neural networks and deep transformers. speakerdeck.com/gpeyre/optim...

submitted 43 days ago • 4 comments

Another nail in the coffin of cosine similarity! I started disliking cossim some years ago due to multiple reasons such as the non-linearity around 0.0 and the loss of certainty-information due to the normalization of feature vectors but this study seems to give another good reason to abandon it.

submitted 45 days ago • 1 comment

MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training The authors propose a pre-training framework using synthetic cross-modal data to enhance LoFTR and RoMa for matching across medical imaging modalities like CT, MR, PET, and SPECT. zju3dv.github.io/MatchAnything/

submitted 45 days ago • 3 comments

The GAN is dead; long live the GAN! A Modern Baseline GAN This is a very interesting paper, exploring making GANs simpler and more performant. abs: arxiv.org/abs/2501.05441 code: github.com/brownvc/R3GAN

submitted 49 days ago • 0 comments

Image matching and ChatGPT - new post in the wide baseline stereo blog. tl;dr: it is good, even feels like human, but not perfect. ducha-aiki.github.io/wide-baselin...

submitted 56 days ago • 2 comments

Introducing ASAL: Automating the Search for Artificial Life with Foundation Models Blog: sakana.ai/asal/ We propose a new method called Automated Search for Artificial Life (ASAL) which uses foundation models to automate the discovery of the most interesting and open-ended artificial lifeforms!

submitted 66 days ago • 2 comments

My book is (at last) out, just in time for Christmas! A blog post to celebrate and present it: francisbach.com/my-book-is-o...

submitted 68 days ago • 2 comments

This paper looks interesting - it argues that you don’t need adaptive systems like Adam to get good gradient-based training, instead you can just set a learning rate for different groups of units based on initialization: arxiv.org/abs/2412.11768 #MLSky #NeuroAI

submitted 69 days ago • 4 comments

Nouvelle vidéo ! Je reviens sur deux articles paru récemment au sujet des capacités des LLM à mentir et manipuler. Au-delà des annonces spectaculaire du type : "o1 a réussi à s'échapper !!!", que disent vraiment ces articles ? Eh bien nous allons voir. (lien dans la réponse)

submitted 69 days ago • 12 comments

Excited to announce ScanNet++ v2!🎉 @awhiteguitar.bsky.social & Yueh-Cheng Liu have been working tirelessly to bring: 🔹1006 high-fidelity 3D scans 🔹+ DSLR & iPhone captures 🔹+ rich semantics Elevating 3D scene understanding to the next level!🚀 w/ @niessner.bsky.social kaldir.vc.in.tum.de/scannetpp

submitted 70 days ago • 1 comment

I'll get straight to the point. We trained 2 new models. Like BERT, but modern. ModernBERT. Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff. It's much faster, more accurate, longer context, and more useful. 🧵

submitted 70 days ago • 19 comments

Libraries and tools that every deep learning project should use: loguru, tqdm, torchmetrics, einops, python 3.11, black. Optional: prettytable. Good for debugging: lovely_tensors. Any other ones I've missed? Below a few words on each of them:

submitted 72 days ago • 3 comments

(1/2) 📢📢𝐆𝐀𝐅: 𝐆𝐚𝐮𝐬𝐬𝐢𝐚𝐧 𝐀𝐯𝐚𝐭𝐚𝐫 𝐑𝐞𝐜𝐨𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 𝐟𝐫𝐨𝐦 𝐌𝐨𝐧𝐨𝐜𝐮𝐥𝐚𝐫 𝐕𝐢𝐝𝐞𝐨𝐬 𝐯𝐢𝐚 𝐌𝐮𝐥𝐭𝐢-𝐯𝐢𝐞𝐰 𝐃𝐢𝐟𝐟𝐮𝐬𝐢𝐨𝐧📢📢 We reconstruct animatable Gaussian head avatars from monocular videos captured by commodity devices such as smartphones.

submitted 73 days ago • 1 comment

Schrödinger Bridge Flow for Unpaired Data Translation (by @vdebortoli.bsky.social et al.) It will take me some time to digest this article fully, but it's important to follow the authors' advice and read the appendices, as the examples are helpful and well-illustrated. 📄 arxiv.org/abs/2409.09347

submitted 72 days ago • 0 comments

The code for Simplified and Generalized Masked Diffusion for Discrete Data (Jiaxin Shi et al) has been released and a lecture by @arnauddoucet.bsky.social on this topic is also available! 🐍 Code: github.com/google-deepm... 📄 Article: arxiv.org/abs/2406.04329 📼 Video: www.youtube.com/watch?v=qj9B...

submitted 76 days ago • 0 comments

3D content creation with touch! We exploit tactile sensing to enhance geometric details for text- and image-to-3D generation. Check out our #NeurIPS2024 work on Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation: ruihangao.github.io/TactileDream... 1/3

submitted 79 days ago • 1 comment

Flow Matching Guide and Code arxiv.org/abs/2412.06264

submitted 79 days ago • 0 comments

(1/n) My favorite "optimizer" work of 2024: 📢 Introducing APOLLO! 🚀: SGD-like memory cost, yet AdamW-level performance (or better!). ❓ How much memory do we need for optimization states in LLM training ? 🧐 Almost zero. 📜 Paper: arxiv.org/abs/2412.05270 🔗 GitHub: github.com/zhuhanqing/A...

submitted 80 days ago • 2 comments

Normalizing Flows are Capable Generative Models Apple introduces TarFlow, a new Transformer-based variant of Masked Autoregressive Flows. SOTA on likelihood estimation for images, quality and diversity comparable to diffusion models. arxiv.org/abs/2412.06329

submitted 80 days ago • 1 comment

📢 𝐏𝐫𝐄𝐝𝐢𝐭𝐨𝐫𝟑𝐃: 𝐅𝐚𝐬𝐭 𝐚𝐧𝐝 𝐏𝐫𝐞𝐜𝐢𝐬𝐞 𝟑𝐃 𝐒𝐡𝐚𝐩𝐞 𝐄𝐝𝐢𝐭𝐢𝐧𝐠 📢 We propose a training-free 3D shape editing approach that rapidly and precisely edits the regions intended by the user and keeps the rest as is.

submitted 80 days ago • 1 comment

Inventors of flow matching have released a comprehensive guide going over the math & code of flow matching! Also covers variants like non-Euclidean & discrete flow matching. A PyTorch library is also released with this guide! This looks like a very good read! 🔥 arxiv: arxiv.org/abs/2412.06264

submitted 80 days ago • 1 comment

What better time to announce a new paper than during NeurIPS and ACCV? happy happy happy to introduce NADA, our latest work on object detection in art! 🎨 with amazing collaborators: @patrick-ramos.bsky.social, @nicaogr.bsky.social, Selina Khan, Yuta Nakashima

submitted 80 days ago • 1 comment

New paper alert! Zero-Shot Coreset Selection: Efficient Pruning for Unlabeled Data Training models requires massive amounts of labeled data. ZCore shows you that you need less labeled data to train good models. Paper Link: arxiv.org/abs/2411.15349 GitHub Repo: github.com/voxel51/zcore

submitted 80 days ago • 0 comments

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks abs: arxiv.org/abs/2412.04626 project page: bigdocs.github.io BigDocs-7.5M is a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks.

submitted 81 days ago • 0 comments

Publication-ready visualization of 3D objects and point clouds in seconds, using @blender.org and BlenderProc. hummat.github.io/bproc-pubvis/

submitted 81 days ago • 2 comments

Airborne #LiDAR has revolutionized the study of ancient rainforest civilizations by seeing through dense canopies. Yet archaeologists still annotate their data manually. Introducing Archaeoscape at #NeurIPS2024 —the first deep learning-scale, open-access archaeological dataset🧵👇

submitted 81 days ago • 1 comment

RoPE has been the one 💯 genuine upgrade to the vanilla Vaswani transformer. This beautiful blogpost by Chris Fleetwood explains the significance and how rotations of Q & K preserves meaning(magnitude) while encodes relative positions(angle shift) 🔥🔥

submitted 87 days ago • 1 comment

A common question nowadays: Which is better, diffusion or flow matching? 🤔 Our answer: They’re two sides of the same coin. We wrote a blog post to show how diffusion models and Gaussian flow matching are equivalent. That’s great: It means you can use them interchangeably.

submitted 87 days ago • 7 comments

Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)? We have been pondering this during summer and developed a new model: JetFormer 🌊🤖 arxiv.org/abs/2411.19722 A thread 👇 1/

submitted 87 days ago • 4 comments

A paper a day, episode 15. You liked the matrix cookbook? You’re gonna love this one. 100 statistics inequalities just for your personal enjoyment. As they say in French, moi j’ai Bienaymé cet article ! arxiv.org/abs/2102.07234

submitted 89 days ago • 5 comments

My deep learning course at the University of Geneva is available on-line. 1000+ slides, ~20h of screen-casts. Full of examples in PyTorch. fleuret.org/dlc/ And my "Little Book of Deep Learning" is available as a phone-formatted pdf (nearing 700k downloads!) fleuret.org/lbdl/

submitted 94 days ago • 50 comments