Profile avatar
geronimo-ai.bsky.social
i like neural networks blog: https://medium.com/@geronimo7 pet project: https://snapfiddle.ai - Image editor with inpainting
541 posts 885 followers 2,312 following
Prolific Poster
Conversation Starter

Good friend of mine started a podcast. If you're into sports, triathlon, overcoming serious setbacks, following your dreams - this is for you www.youtube.com/watch?v=rj0k...

Creator of Stable Diffusion casually talking about his latest adventure: FLUX www.youtube.com/watch?v=nrKK...

No training of a text-to-image model without text. Here's my latest blog post on how to caption large datasets with SmolVLM2, Moondream2, and Qwen 2.5 VL medium.com/@geronimo7/i...

i'm working on a mini diffusion model. from scratch. trained on imagenet. it seems to struggle with the anatomy of airplanes. dogs are easy. why

Deepseek released Flash MLA code github.com/deepseek-ai/...

github.com/openai/SWELa...

Revealing Hidden Generative Capabilities in Discriminative Models github.com/stanislavfor... arxiv.org/pdf/2502.07753

i've built an object remover app that runs entirely in the browser! All of the data stays on your computer demo: next-lama.vercel.app

Deep Dive into LLMs like ChatGPT by @karpathy.bsky.social www.youtube.com/watch?v=7xTG...

huggingface.co/mistralai/Mi...

Love HF 🤗 Started an effort to reproduce R1 github.com/huggingface/...

4-bit Sana released demo: svdquant.mit.edu github.com/NVlabs/Sana/...

HAN lab released v1.1 of Sana's DC-AE huggingface.co/mit-han-lab/...

FluxEdit github.com/sayakpaul/fl...

holy sh, deepseek just delivered github.com/deepseek-ai/...

Flux-dev ControlNet Model huggingface.co/sayakpaul/ed...

NVIDIA AceInstruct-72B a family of advanced SFT models for coding, mathematics, and general-purpose tasks research.nvidia.com/labs/adlr/ac... huggingface.co/nvidia/AceIn...

Musk Zuckerberg

FLUX Pro Finetuning API announced blackforestlabs.ai/announcing-t... $2-$6 per finetuned model

#psychology #perception #meme

transformer.js support for Kokoro! Kokoro is the #1 text-to-speech model on leaderboard: huggingface.co/spaces/Pendr... thanks to @xenova.bsky.social and transformers.js it now runs in the browser!! huggingface.co/onnx-communi...

Tensor Product Attention Is All You Need "a novel attention mechanism that uses tensor decompositions to represent queries, keys, and values compactly, significantly shrinking KV cache size at inference time" code: github.com/tensorgi/T6 arxiv.org/abs/2501.06425

"Titans" implementation WIP github.com/lucidrains/t...

Sana 4096x4096px + 1024x1025px center crop

Sana 4k generates pretty impressive images, same old struggle with human anatomy though.

Generate 16 megapixels of weird fruits with Sana 4k! Runs on 24GB VRAM, cuda and mps thanks to a recent PR in diffusers, takes ~40s/img on a RTX 4090. Code 👇 github.com/geronimi73/3...

Sana 4k released huggingface.co/Efficient-La...

LLM aha moment 🤯 after the last decoder block, the last token contains contains ALL the context and only this information is used to generate the next token www.reddit.com/r/learnmachi...

Deep Lourning Course by Han lab (MIT) lecture videos + slides hanlab.mit.edu/courses/2024...

Key-value memory is an important concept in modern machine learning (e.g., transformers). Ila Fiete, Kazuki Irie, and I have written a paper showing how key-value memory provides a way of thinking about memory organization in the brain: arxiv.org/abs/2501.02950

X is finished

Wrote Part 2 of my now ankle-deep dive into NVIDIA's Sana, this time looking at its Diffusion Transformer component: medium.com/@geronimo7/u...

PdfItDown: everything -> pdf built on top of markitdown