Profile avatar
ljupco.bsky.social
Now-quant trading & R & D. Prior-ASR in noise, synthesis, ML. In Harpenden UK. From Skopje MK. Open weights open thoughts free AI computation for e/acc - now! πŸ₯° Bsky https://tinyurl.com/3nxj7pcc (tap Latest) Home https://ljubomirj.github.io Hi! 😊
318 posts 644 followers 744 following
Prolific Poster
Conversation Starter
comment in response to post
files-to-prompt datasette -e py -c | \ llm -m o3-mini -s \ 'write extensive documentation for how the permissions system works, as markdown' Gave me this: gist.github.com/simonw/4a13c...
comment in response to post
By being integrated in Sentence Transformers without any other dependencies, all Static Embedding models work out of the box in all projects that integrate with Sentence Transformers, like: - @langchain.bsky.social
comment in response to post
πŸ“ Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more. πŸͺ† Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% performance decrease for English Similarity tasks) 🧡
comment in response to post
The Static Embedding models have some excellent properties: 🏎️ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for all-mpnet-base-v2 and 56 for gte-large-en-v1.5 πŸ“ No maximum sequence length! Embed texts at any length (at your own risk) 🧡
comment in response to post
Static Embedding models have been around since before Transformers (e.g. GLoVe, word2vec), they work with pre-computed word embeddings from a mapping. I apply this simple architecture, but train it like a modern embedding model: Contrastive Learning with Matryoshka support. 🧡
comment in response to post
🧠 my modern training strategy: ideation -> dataset choice -> implementation -> evaluation πŸ“œ my training scripts, using the Sentence Transformers library πŸ“Š my Weights & Biases reports with losses & metrics πŸ“• my list of 30 training and 13 evaluation datasets 🧡
comment in response to post
We apply our recipe to train 2 Static Embedding models that we release today! We release: 2️⃣ an English Retrieval model and a general-purpose Multilingual similarity model (classification, clustering, etc.), both Apache 2.0. Fully integrated in Sentence Transformers, etc. 🧡
comment in response to post
Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: hf.co/blog/static-... Or read more in this thread first 🧡
comment in response to post
β€’ Want to find your friends from other networks? Check out the Sky Follower Bridge Chrome extension built by an independent developer! chromewebstore.google.com/detail/sky-f...
comment in response to post
This article www.richardhanania.com/p/understand... I thought articulates well and delineates the new ideology of Tech Right, and where it matches and where it contradicts the old American conservative Right. Form Jun-2023, buy the passage of time strengthens the confidence in the article imo.
comment in response to post
github.com/dockur/windows
comment in response to post
You can find the models here huggingface.co/answerdotai/... huggingface.co/answerdotai/... If you want all the details, please have a look at the nicely written blog post and the very detailed paper I'll go on with some less general and more personal information
comment in response to post
We also carefully designed the shapes of the models to optimize inference on common hardware Coupled to unpadding through the full processing flash attention, this allows ModernBERT to be two to three times faster than most encoders on long-context on a RTX 4090
comment in response to post
These various improvements coupled to a 2T tokens training results in fast and accurate models that handle long context (and code!) ModernBERT yields state-of-the-art results on various tasks, including IR (short and long context text as well as code) and NLU
comment in response to post
Blog post: huggingface.co/blog/modernb... Paper link: arxiv.org/abs/2412.13663 With @lightonai.bsky.social folks, we collaborated with @answerai.bsky.social (and friends!) to leverage the advances from recent years of work on LLMs (architecture and training) and apply them to a BERT-style model
comment in response to post
So how does it work? ModernBERT brings modern engineering ideas from LLMs over to encoder models. We did so in three core ways: 1) a modernized transformer architecture; 2) particular attention to efficiency; 3) modern data scales & sources. Details here: arxiv.org/abs/2412.13663
comment in response to post
GPT & friends are hamstrung: as generative models, they are mathematically β€œnot allowed” to β€œpeek” at later tokens. They can only ever look *backwards*. This is in contrast to encoder-only models like BERT, which are trained so each token can look forwards *and* backwards.
comment in response to post
6 years after BERT, we have a replacement: ModernBERT! @answerdotai, @LightOnIO (et al) took dozens of advances from recent years of work on LLMs, and applied them to a BERT-style model, including updates to the architecture and the training process, eg alternating attention.