Profile avatar
tomaarsen.com
Sentence Transformers, SetFit & NLTK maintainer Machine Learning Engineer at 🤗 Hugging Face
147 posts 2,325 followers 195 following
Getting Started
Active Commenter

We've just released MMTEB, our multilingual upgrade to the MTEB Embedding Benchmark! It's a huge collaboration between 56 universities, labs, and organizations, resulting in a massive benchmark of 1000+ languages, 500+ tasks, and a dozen+ domains. Details in 🧵

The folks at Nomic just released nomic-embed-text-v2-moe, a MoE (Mixture of Experts) embedding model with state-of-the-art retrieval performance across ~100 languages. I've never seen a MoE embedding model until now, so very nice to see these! More in 🧵

Today, The Minish Lab released 2 more Static Embedding models; potion-base-32M & potion-retrieval-32M stronger performance than before, while still easily processing e.g. 50k sentences per second. The text embeddings can be used for retrieval, classification, clustering, etc. 🧵

Sentence Transformers v3.4.1 is out! A nice and small release bringing you instant access to some nice and small models: Model2Vec compatibility! These types of models are extremely fast (e.g. 400x faster than small transformer-based models), while only being a tad worse. 🧵

I just released Sentence Transformers v3.4.0, featuring a memory leak fix (memory not being cleared upon model & trainer deletion), compatibility between the powerful Cached... losses and the Matryoshka loss modifier, and a bunch of fixes & small features. Details in 🧵

🏎️ Today I'm introducing a method to train static embedding models that run 100x to 400x faster on CPU than common embedding models, while retaining 85%+ of the quality! Including 2 models with training scripts, datasets, metrics, evals, ideation, all public. Details in 🧵