ljupco.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

ljupco.bsky.social

Now-quant trading & R & D. Prior-ASR in noise, synthesis, ML. In Harpenden UK. From Skopje MK. Open weights open thoughts free AI computation for e/acc - now! 🥰 Bsky https://tinyurl.com/3nxj7pcc (tap Latest) Home https://ljubomirj.github.io Hi! 😊

318 posts 644 followers 744 following

Posts 32 Comments 18

comment in response to post

files-to-prompt datasette -e py -c | \ llm -m o3-mini -s \ 'write extensive documentation for how the permissions system works, as markdown' Gave me this: gist.github.com/simonw/4a13c...

submitted 22 days ago

comment in response to post

By being integrated in Sentence Transformers without any other dependencies, all Static Embedding models work out of the box in all projects that integrate with Sentence Transformers, like: - @langchain.bsky.social

submitted 43 days ago

comment in response to post

📐 Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more. 🪆 Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% performance decrease for English Similarity tasks) 🧵

submitted 43 days ago

comment in response to post

The Static Embedding models have some excellent properties: 🏎️ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for all-mpnet-base-v2 and 56 for gte-large-en-v1.5 📏 No maximum sequence length! Embed texts at any length (at your own risk) 🧵

submitted 43 days ago

comment in response to post

Static Embedding models have been around since before Transformers (e.g. GLoVe, word2vec), they work with pre-computed word embeddings from a mapping. I apply this simple architecture, but train it like a modern embedding model: Contrastive Learning with Matryoshka support. 🧵

submitted 43 days ago

comment in response to post

🧠 my modern training strategy: ideation -> dataset choice -> implementation -> evaluation 📜 my training scripts, using the Sentence Transformers library 📊 my Weights & Biases reports with losses & metrics 📕 my list of 30 training and 13 evaluation datasets 🧵

submitted 43 days ago

comment in response to post

We apply our recipe to train 2 Static Embedding models that we release today! We release: 2️⃣ an English Retrieval model and a general-purpose Multilingual similarity model (classification, clustering, etc.), both Apache 2.0. Fully integrated in Sentence Transformers, etc. 🧵

submitted 43 days ago

comment in response to post

Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: hf.co/blog/static-... Or read more in this thread first 🧵

submitted 43 days ago

comment in response to post

• Want to find your friends from other networks? Check out the Sky Follower Bridge Chrome extension built by an independent developer! chromewebstore.google.com/detail/sky-f...

submitted 133 days ago

comment in response to post

This article www.richardhanania.com/p/understand... I thought articulates well and delineates the new ideology of Tech Right, and where it matches and where it contradicts the old American conservative Right. Form Jun-2023, buy the passage of time strengthens the confidence in the article imo.

submitted 63 days ago

comment in response to post

github.com/dockur/windows

submitted 65 days ago

comment in response to post

You can find the models here huggingface.co/answerdotai/... huggingface.co/answerdotai/... If you want all the details, please have a look at the nicely written blog post and the very detailed paper I'll go on with some less general and more personal information

submitted 70 days ago

comment in response to post

We also carefully designed the shapes of the models to optimize inference on common hardware Coupled to unpadding through the full processing flash attention, this allows ModernBERT to be two to three times faster than most encoders on long-context on a RTX 4090

submitted 70 days ago

comment in response to post

These various improvements coupled to a 2T tokens training results in fast and accurate models that handle long context (and code!) ModernBERT yields state-of-the-art results on various tasks, including IR (short and long context text as well as code) and NLU

submitted 70 days ago

comment in response to post

Blog post: huggingface.co/blog/modernb... Paper link: arxiv.org/abs/2412.13663 With @lightonai.bsky.social folks, we collaborated with @answerai.bsky.social (and friends!) to leverage the advances from recent years of work on LLMs (architecture and training) and apply them to a BERT-style model

submitted 70 days ago

comment in response to post

So how does it work? ModernBERT brings modern engineering ideas from LLMs over to encoder models. We did so in three core ways: 1) a modernized transformer architecture; 2) particular attention to efficiency; 3) modern data scales & sources. Details here: arxiv.org/abs/2412.13663

submitted 70 days ago

comment in response to post

GPT & friends are hamstrung: as generative models, they are mathematically “not allowed” to “peek” at later tokens. They can only ever look *backwards*. This is in contrast to encoder-only models like BERT, which are trained so each token can look forwards *and* backwards.

submitted 70 days ago

comment in response to post

6 years after BERT, we have a replacement: ModernBERT! @answerdotai, @LightOnIO (et al) took dozens of advances from recent years of work on LLMs, and applied them to a BERT-style model, including updates to the architecture and the training process, eg alternating attention.

submitted 70 days ago