this question is very dear to my research heart and i've been following it a lot!
BabyLLM challenge also introduced a multi-modal track 1-2 years ago to see if multi-modal data helps with language acquisition but again not really positive results came out of it:
Comments
BabyLLM challenge also introduced a multi-modal track 1-2 years ago to see if multi-modal data helps with language acquisition but again not really positive results came out of it:
https://babylm.github.io/