Profile avatar
merve.bsky.social
proud mediterrenean 🧿 open-sourceress at hugging face 🤗 multimodality, zero-shot vision, vision language models, transformers
240 posts 8,213 followers 675 following
Prolific Poster
Conversation Starter

Why do people sleep on DSE multimodal retrieval models? 👀 They're just like ColPali, but highly scalable, fast and you can even make them more efficient with binarization or matryoshka with little degradation 🪆⚡️ I collected some here huggingface.co/collections/...

I'm so hooked on @hf.co Inference Providers (specifically Qwen2.5-VL-72B) for multimodal agentic workflows with smolagents 🥹 get started ⤵️ > filter models provided by different providers > test them through widget or Python/JS/cURL

my weekly summary on what's released in open AI is up on @hf.co huggingface.co/posts/merve/... collection is here huggingface.co/collections/...

fan-favorite open-source PDF rendering model OlmOCR goes faster and more efficient ⚡️ RolmOCR-7B follows same recipe with OlmOCR, builds on Qwen2.5VL with training set modifications and improves accuracy & performance 🤝 huggingface.co/reducto/Rolm...

Hello friends 👋🏼 If visit Turkey this summer, know that millions of Turkish people are doing a boycott, once a week not buying anything and rest of the week only buying necessities if you have plans, here's a post that summarizes where you should buy stuff from www.instagram.com/share/BADrkS...

SmolVLM paper is out and it's packed with great findings on training a good smol vision LM! Andi summarized them below, give it a read if you want to see more insights 🤠

DO NOT SLEEP ON THIS MODEL Kimi-VL-A3B-Thinking is the first ever capable open-source reasoning VLM with MIT license ❤️ > it has only 2.8B activated params 👏 > it's agentic 🔥 works on GUIs > surpasses gpt-4o I've put it to test (see below ⤵️) huggingface.co/spaces/moons...

InternVL3 is out 💥 > 7 ckpts with various sizes (1B to 78B) > Built on InternViT encoder and Qwen2.5VL decoder, improves on Qwen2.5VL > Can do reasoning, document tasks, extending to tool use and agentic capabilities 🤖 > easily use with Hugging Face transformers 🤗 huggingface.co/collections/...

Model Context Protocol has prompt injection security problems simonwillison.net/2025/Apr/9/m...

Xet infra now backs 1000s of repos on @hf.co , which means we get to put on our researcher hats and peer into the bytes 👀 🤓 Xet clients chunk files (~64KB) and skip uploads of duplicate content, but what if those chunks are already in _another_ repo? We skip those too.

SmolVLM paper is out and it's packed with great findings on training a good smol vision LM! Andi summarized them below, give it a read if you want to see more insights 🤠

X'in politikaları sebebiyle işimle alakalı post'ları burada da paylaşıyor olacağım, takip edebilirsiniz 😊

icymi I shipped a tutorial on fine-tuning vision language models on videos ⏯️ learn how to fine-tune SmolVLM2 on Video Feedback dataset 📖 github.com/merveenoyan/...

All the multimodal document retrieval models (ColPali, DSE et al) are now under visual document retrieval at @hf.co 📝🤗 take your favorite VDR model out for multimodal RAG 🤝

Introducing the smollest VLMs yet! 🤏 SmolVLM (256M & 500M) runs on <1GB GPU memory. Fine-tune it on your laptop and run it on your toaster. 🚀 Even the 256M model outperforms our Idefics 80B (Aug '23). How small can we go? 👀

Everything that was released passed week in open AI 🤠 > Link to all models, datasets, demos huggingface.co/collections/... > Text-readable version is here huggingface.co/posts/merve/...

there's a new multimodal retrieval model in town 🤠 @llamaindex.bsky.social released vdr-2b-multi-v1 > uses 70% less image tokens, yet outperforming other dse-qwen2 based models > 3x faster inference with less VRAM 💨 > shrinkable with matryoshka 🪆 huggingface.co/collections/...

What a week to open the year in open ML, all the things released at @hf.co 🤠 Here's everything released, find text-readable version here huggingface.co/posts/merve/... All models are here huggingface.co/collections/...

ViTPose -- best open-source pose estimation model just landed to @hf.co transformers 🕺🏻💃🏻 🔖 Model collection: huggingface.co/collections/... 🔖 Notebook on how to use: colab.research.google.com/drive/1e8fcb... 🔖 Try it here: huggingface.co/spaces/hysts...

ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license 💗 The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos ⏯️

supercharge your LLM apps with smolagents 🔥 however cool your LLM is, without being agentic it can only go so far enter smolagents: a new agent library by @hf.co to make the LLM write code, do analysis and automate boring stuff! huggingface.co/blog/smolage...

ColPali is landed at @hf.co transformers and I have just shipped a very lean fine-tuning tutorial in smol-vision 🤠💗 QLoRA fine-tuning with 4-bit with bsz of 4 can be done with 32 GB VRAM and is very fast! ✨ github.com/merveenoyan/...

you can now stay up-to-date with big AI research labs' updates on @hf.co easily over org activity page 🥹 I have been looking forward to this feature as I felt most back to back releases are overwhelming and I tend to miss out 🤠

BERT is so back 🔥 Answer AI and Lighton released ModernBERT: lightning-fast state-of-the-art BERT model with Apache 2.0 license 🥹 2x fast as debertav3 and 3x faster than nomic 💨 all models are here hf.co/collections/answerdotai/modernbert-67627ad707a4acbf33c41deb read more hf.co/blog/modernbert 📖

Aya by Cohere For AI can now see! 👀 C4AI community has built Maya 8B, a new open-source multilingual VLM built on SigLIP and Aya 8B 🌱 works on 8 languages! 🗣️ The authors extend Llava dataset using Aya's translation capabilities with 558k examples! works very well ⬇️ huggingface.co/spaces/kkr51...

VLMs go MoE ✨ DeepSeek AI dropped three new commercially permissive vision LMs based on SigLIP encoder and their DeepSeek-MoE decoder 🐳 the models come in 1.0B, 2.8B and 4.5B active params 🥹 models seem to catch up with state-of-the-art with less active parameters! huggingface.co/collections/...

Learn how to build a complete multimodal RAG pipeline with ColQwen2 as retriever, MonoQwen2-VL as reranker, Qwen2-VL as VLM in this notebook that runs on a GPU as small as L4 🔥 huggingface.co/learn/cookbo...

This week in open-source AI was insane 🤠 A small recap🕺🏻 Text-readable version is here huggingface.co/posts/merve/... Collection to all models, datasets, demos is here huggingface.co/collections/...

'tis the season of open-source video models 🎄📹⚡️ Tencent just dropped the weights for ✨HunyuanVideo✨ - 13B parameters - competitive with closed source - code & weights released - demo coming soon 🔥

this was SmolVLM fine-tuning, and you can also do this! 🤗 I made a notebook that includes all the goodies: QLoRA, gradient accumulation, gradient checkpointing with explanations on how they work 💝 below snapshot is with bsz=4 with simulated bsz=16 on L4 🤠 github.com/huggingface/...

So many open-source and open releases last week! Here's a recap, find the text-readable version here huggingface.co/posts/merve/...

it only takes a single CLI command to kick-off a Direct Preference Optimization fine-tuning run on SmolVLM huggingface.co/blog/smolvlm... you're welcome