What can you do with multimodal LLMs? How about identifying objects by name, description, color, and even drawing a bounding box around them?

🖼️ ➡️ 📄

Gemini makes it possible, Genkit makes it simple.
Post image

Comments