OmniVision-968M: a new local VLM for edge devices, fast & small but performant πŸ‘

it's based on SigLIP-so-400M and Qwen-2.5-0.5B
πŸ’¨ 9x less image tokens, super efficient
πŸ“– aligned with SFT and DPO for reducing hallucinations
πŸ”₯ Apache 2.0 license
Demo https://hf.co/spaces/NexaAIDev/omnivlm-dpo-demo
Post image

Comments