Profile avatar
gowthami.bsky.social
PhD-ing at UMD. Knows a little about multimodal generative models. Check out my website to know more - https://somepago.github.io/
29 posts 2,170 followers 190 following
Regular Contributor
Active Commenter

What’s the right resolution for such ontologies? 1,000-10,000 seems like the sweet spot. H/t @aneeshsathe.com aneeshsathe.com/2025/01/15/d...

About to send my last DLCT email of the year today (in 2 hours). Join the 7-year-old mailing list if you haven't heard of it. (And if you have heard of it but haven't joined, I trust that it's a well thought decision that suits you the best.) groups.google.com/g/deep-learn...

The recording of my #NeurIPS2024 workshop talk on multimodal iterative refinement is now available to everyone who registered: neurips.cc/virtual/2024... My talk starts at 1:10:45 into the recording. I believe this will be made publicly available eventually, but I'm not sure when exactly!

One of the best tutorials for understanding Transformers! 📽️ Watch here: www.youtube.com/watch?v=bMXq... Big thanks to @giffmana.ai for this excellent content! 🙌

Anne Gagneux, Ségolène Martin, @quentinbertrand.bsky.social Remi Emonet and I wrote a tutorial blog post on flow matching: dl.heeere.com/conditional-... with lots of illustrations and intuition! We got this idea after their cool work on improving Plug and Play with FM: arxiv.org/abs/2410.02423

congratulations, @ian-goodfellow.bsky.social, for the test-of-time award at @neuripsconf.bsky.social! this award reminds me of how GAN started with this one email ian sent to the Mila (then Lisa) lab mailing list in May 2014. super insightful and amazing execution!

Trying to build a "books you must read" list for my lab that everyone gets when they enter. Right now its: - Sutton and Barto - The Structure of Scientific Revolutions - Strunk and White - Maybe "Prediction, Learning, and Games", TBD Kinda curious what's missing in an RL / science curriculum

This is a simple and good paper, which somehow nobody working on these things cites, or even seems to be aware of arxiv.org/abs/2406.05213 It is simple idea that seems useful; it formulates the subjective uncertainty for natural language generation in a decision-theoretic setup.

A real-time (or very fast) open-source txt2video model dropped: LTXV. HF: huggingface.co/Lightricks/L... Gradio: huggingface.co/spaces/Light... Github: github.com/Lightricks/L... Look at that prompt example though. Need to be a proper writer to get that quality.

Perhaps an unpopular opinion, but I don't think the problem with Large Language Model evaluations is the lack of error bars.

let me say it once more: "the gap between OAI/Anthropic/Meta/etc. and a large group of companies all over the world you've never cared to know of, in terms of LM pre-training? tiny"

The return of the Autoregressive Image Model: AIMv2 now going multimodal. Excellent work by @alaaelnouby.bsky.social & team with code and checkpoints already up: arxiv.org/abs/2411.14402

Interesting paper on arxiv this morning: arxiv.org/abs/2411.13683 It's a video masked autoencoder in which you learn which tokens to mask to process fewer of them and scale to longer videos. It's a #NeurIPS2024 apparently. I wonder if there could be such strategy in the pure generative setup.

I’m not getting notifications for comments here, anyone facing the same issue?

Discrete diffusion has become a very hot topic again this year. Dozens of interesting ICLR submissions and some exciting attempts at scaling. Here's a bibliography on the topic from the Kuleshov group (my open office neighbors). github.com/kuleshov-gro...

I only got to know today this awesome diffusion starter pack exists! I’ll try to fill up my generative models pack with some complementary folks. :)

Can people create accounts here without invite now? 🤔

I would miss not having a character limit since my rants grew larger, longer I’m in grad school! 😅

Started a list of some researchers working on image/video generation. (Not comprehensive at all) Reply with a paper link and TLDR to get added to the list! I request all grad students to not feel imposter-y and just reply if you work in this field! #computervision #diffusion go.bsky.app/SP1uWoE

www.astralcodexten.com/p/how-did-yo...

My growing list of #computervision researchers on Bsky. Missed you? Let me know. go.bsky.app/M7HGC3Y

I think we broke the app! I’m trying to retweet something and it’s not working! 😅

Need bookmarks features asap! 🥺 @bsky.app

I'm slowly putting my intro to ML course material on github, starting with the lab sessions: github.com/davidpicard/... These are self-contained notebooks in which you have to implement famous algorithms from the literature (k-NN, SVM, DT, etc), with a custom dataset that I (painstakingly) made!

What are some must-read multimodal generation papers? I am looking for vision-language models (preferably jointly trained from scratch). Some examples - Chameleon - arxiv.org/abs/2405.09818 Transfusion - arxiv.org/abs/2408.11039 JanusFlow - arxiv.org/abs/2411.07975 #computervision #multimodal

6 years ago, in the days of GAN art, I wrote an article called "Can Computers Create Art?", arguing that computers should not be considered artists, regardless of how good image generation gets. People sometimes ask if my views have changed. I say, 1/🧵 www.mdpi.com/2076-0752/7/...

Here's my attempt at assembling a starter pack of the still nascent computer vision community on Bluesky. Feel free to recommend other accounts that should be in this starter pack. :) #computervision (Do we use hashtags here? 😅) go.bsky.app/PkAKJu5