LLaVA-Mini🔥 A efficient multimodal model for image and video understanding released by Chinese Academy of Sciences
Paper: https://huggingface.co/papers/2501.03895
Model: https://huggingface.co/ICTNLP/llava-mini-llama-3.1-8b
✨ Matches LLaVA-v1.5 using just 1 vision token
✨ Delivers <40ms response time

Comments