We posted our paper on arxiv recently, sharing this here too: https://arxiv.org/abs/2412.14164v1 - work led by our amazing intern Peter Tong. Key findings:
- LLMs can be trained to generate visual embeddings!!
- VQA data appears to help a lot in generation!
- Better understanding = better generation!
- LLMs can be trained to generate visual embeddings!!
- VQA data appears to help a lot in generation!
- Better understanding = better generation!
Comments
Excited to see how the community "MetaMorph"'s existing LLMs!