No training of a text-to-image model without text. Here's my latest blog post on how to caption large datasets with SmolVLM2, Moondream2, and Qwen 2.5 VL
https://medium.com/@geronimo7/image-captioning-on-multiple-gpus-0a50cecbdcc4
https://medium.com/@geronimo7/image-captioning-on-multiple-gpus-0a50cecbdcc4
Comments