What can you do with multimodal LLMs? How about identifying objects by name, description, color, and even drawing a bounding box around them?
🖼️ ➡️ 📄
Gemini makes it possible, Genkit makes it simple.
🖼️ ➡️ 📄
Gemini makes it possible, Genkit makes it simple.
Comments
Link: https://examples.genkit.dev/image-analysis