This is super exciting! I've been hanging out for a modern uplift to BERT models that have larger context windows. 512 tokens is pretty limiting for a number of use cases I've had. Have yet to dig in, but it looks like awesome work! #MLSky #DataBS #NLP
Reposted from Jeremy Howard
I'll get straight to the point.

We trained 2 new models. Like BERT, but modern. ModernBERT.

Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff.

It's much faster, more accurate, longer context, and more useful. 🧵

Comments