No training of a text-to-image model without text. Here's my latest blog post on how to caption large datasets with SmolVLM2, Moondream2, and Qwen 2.5 VL

https://medium.com/@geronimo7/image-captioning-on-multiple-gpus-0a50cecbdcc4

Comments