LLaVA-Mini🔥 A efficient multimodal model for image and video understanding released by Chinese Academy of Sciences Paper: huggingface.co/papers/2501.... Model: huggingface.co/ICTNLP/llava... ✨ Matches LLaVA-v1.5 using just 1 vision token ✨ Delivers <40ms response time - ThreadSky

adinayakup.bsky.social • 49 days ago

LLaVA-Mini🔥 A efficient multimodal model for image and video understanding released by Chinese Academy of Sciences
Paper: https://huggingface.co/papers/2501.03895
Model: https://huggingface.co/ICTNLP/llava-mini-llama-3.1-8b
✨ Matches LLaVA-v1.5 using just 1 vision token
✨ Delivers <40ms response time

Comments

Posting Rules

Comments

Posting Rules

Reply