Efficiency is not only about speed. ModernBERT is also memory-friendly and can handle larger batch sizes than previous encoders, which can come in handy for contrastive learning or running on smaller GPUs (which is an important use case for encoders) - ThreadSky

nohtow.bsky.social • 70 days ago

Efficiency is not only about speed. ModernBERT is also memory-friendly and can handle larger batch sizes than previous encoders, which can come in handy for contrastive learning or running on smaller GPUs (which is an important use case for encoders)

Comments

nohtow.bsky.social•70 days ago

You can find the models here
https://huggingface.co/answerdotai/ModernBERT-base
https://huggingface.co/answerdotai/ModernBERT-large
If you want all the details, please have a look at the nicely written blog post and the very detailed paper
I'll go on with some less general and more personal information

nohtow.bsky.social•70 days ago

Starting with my beloved PyLate
We have a lot of experiments on ColBERT models in the paper, with tons of different base models
PyLate handled it all, even models using half-baked remote code
This was a really cool stress test and I am really happy it went so smoothly

nohtow.bsky.social•70 days ago

Besides versatility, this also highlighted the potential of PyLate
The ModernBERT-base checkpoint achieves 51.3 of BEIR average
This means that we beat e5 in a <45 minutes training on MS MARCO only (using only half of the memory of our 8x100)

nohtow.bsky.social•70 days ago

Given that I never pre-trained a model before, there were obviously a lot of challenges and it was not always easy
But I am very grateful to everyone involved in this project, I truly learned a lot and I am so proud of the models we managed to build together

nohtow.bsky.social•70 days ago

We have conducted many ablations, and many of them did not end up being used in the final models, but we tried to share as many insights as we could in the paper. I hope you will enjoy the read!

nohtow.bsky.social•70 days ago

The project started as a will to bring some light back to the encoders
Besides the models, I think we showed that there is still a lot to be done and I hope we succeed at igniting back the interest in encoder pre-training 🔥

Comments

Posting Rules

Reply