Does anyone know how to get image representations from the Llama-3.2-11B-Vision models? Specifically a pooled representation of the input image

Comments