vishal-learner.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

I made a video on the Hypencoder paper (arxiv.org/abs/2502.05364) in which they train a hypernetwork which takes in query token representations and outputs a neural net which takes in a single-vector document representation and outputs a relevance score! www.youtube.com/watch?v=-xWB...

submitted 2 days ago • 0 comments

One of the questions we debated while training ModernBERT was whether a modern trained encoder would unlock zero-shot reasoning using only it's generative head? Spoilers: the answer is yes.

submitted 14 days ago • 1 comment

I made a video (30 mins) walking through the code and concepts of the peft library's DoRA implementation. I was inspired to dig into this after reading/running Raschka's DoRA and LoRA from scratch magazine article! youtu.be/GE6jRudHhzY?...

submitted 32 days ago • 0 comments

I made a video on the rsLoRA paper, in which we learn that low ranks are not "sufficient" but rather, a particular scaling factor (alpha/sqrt(r)) is needed to stabilize training and unlock increased performance for high LoRA ranks. 📈📈📈 www.youtube.com/watch?v=TVfd...

submitted 38 days ago • 0 comments

For my first paper summary/presentation video (36 min), I walk through the "LoRA Learns Less and Forgets Less" paper by Biderman et al. Here are the main takeaways and their recommendations for using LoRA. Planning to do these regularly! 📄🔍🧠 www.youtube.com/watch?v=0p6H...

submitted 39 days ago • 1 comment

Wrote up my notes on ModernBERT, the brand new modern alternative to 2018-era BERT released by @benjaminwarner.dev and @howard.fm and team simonwillison.net/2024/Dec/24/...

submitted 62 days ago • 7 comments

Wrote up my notes on ModernBERT, the brand new modern alternative to 2018-era BERT released by https://www.answer.ai/ https://simonwillison.net/2024/Dec/24/modernbert/

submitted 62 days ago • 1 comment

This week we released ModernBERT, the first encoder to reach SOTA on most common benchmarks across language understanding, retrieval, and code, while running twice as fast as DeBERTaV3 on short context and three times faster than NomicBERT & GTE on long context.

submitted 64 days ago • 2 comments

Published a blog post version of my PLAID scoring pipeline code walkthrough with explanatory/narrative text! vishalbakshi.github.io/blog/posts/2...

submitted 62 days ago • 0 comments

New LLM Eval Office Hours, I discuss the importance of doing error analysis before jumping into metrics and tests Links to notes in the YT description youtu.be/ZEvXvyY17Ys?...

submitted 64 days ago • 1 comment

Very excited to share my step-by-step walkthrough of the colbert repo code needed to recreate the 4-stage PLAID scoring pipeline to match RAGatouille results! Amazing to see centroids, scores, passage token encodings & passage IDs come to life!! youtu.be/XRPP5LHHk0o (1/2)

submitted 65 days ago • 1 comment

I'll get straight to the point. We trained 2 new models. Like BERT, but modern. ModernBERT. Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff. It's much faster, more accurate, longer context, and more useful. 🧵

submitted 67 days ago • 19 comments

I published a 6-video series on Information Retrieval fundamentals! 1. fastbook-benchmark Overview 2. Document Processing 3. Full Text Search 4. Scoring Retrieval Results 5. Single Vector Search 6. ColBERT search Code: github.com/vishalbakshi... YT Playlist: www.youtube.com/watch?v=VsVI...

submitted 69 days ago • 0 comments

Excited to launch fastbook-benchmark: a dataset of 191 multi-component QA pairs for evaluating IR on complex educational/technical content! GitHub: github.com/vishalbakshi... Video walkthrough: www.youtube.com/watch?v=VsVI... Video walkthroughs of retrieval experiments coming soon! 🚀

submitted 71 days ago • 0 comments

Just posted my second video! Implementing image-to-image generation with stable diffusion: youtu.be/POisZHNP23c got a different headset (Audio-Technica ATH-M50xSTS-USB) and I find the audio quality and ease of use much improved from the XLR broadcasting headset I used for my first vid LET'S GO 🚀

submitted 72 days ago • 0 comments

I'm starting to record ML videos! I'll be focusing on what I'm learning in part 2 of the fastai course, research papers I'm reading, and projects I'm working on like fastbookRAG. Kicking it off with coding through negative prompting in stable diffusion. LET'S GO!! 🚀🚀🚀 www.youtube.com/watch?v=_nzR...

submitted 75 days ago • 0 comments

Gonna try this out with the ColBERT repo this week. Exciting!!!

submitted 78 days ago • 0 comments

Chat with any open source repo easily. Gitingest (free online tool) turns any GitHub repository into a single markdown file for pasting. Claude artifacts makes this 300k token output pretty easy to work with.

submitted 78 days ago • 6 comments

How does the script work? - It connects to BluesSky API with your username/pwd - You can pass a handle and it will retrieve the las 72hours replies - It will iterate the replies and run OpenAI moderation APi on each one (you can replace OpenAI by the moderation filter of your liking)

submitted 87 days ago • 1 comment

A dataset of 1 million or 2 million Bluesky posts is completely irrelevant to training large language models. The primary usecase for the datasets that people are losing their shit over isn't ChatGPT, it's social science research and developing systems that improve Bluesky.

submitted 88 days ago • 8 comments

i am surprised that only 23 posts in i'm already blocked by 3 people!

submitted 88 days ago • 0 comments

An AI researcher that wants to stop big tech owning everything was permabanned here for releasing a dataset of 2M posts. A librarian received death threats for a 1M post dataset. The EU funded the creation of this dataset of 235M posts months ago, and… nothing? zenodo.org/records/1108...

submitted 88 days ago • 13 comments

I wrote a thing about "Storing time for human events" - how if you're building an events website used by actual human beings the standard advice of "convert times to UTC and just store that" isn't actually the best approach simonwillison.net/2024/Nov/27/...

submitted 89 days ago • 20 comments

It's pretty sad to see the negative sentiment towards Hugging Face on this platform due to a dataset put by one of the employees. I want to write a small piece. 🧵 Hugging Face empowers everyone to use AI to create value and is against monopolization of AI it's a hosting platform above all.

submitted 89 days ago • 29 comments

testing if I can see the GIF animation after uploading and posting it

submitted 89 days ago • 1 comment

eval'd 24 retrieval method/chunking combos w/ fastbook-benchmark in an order of mag less time than manual evals. answerai-colbert-small-v1 had the best MRR@10, single-vec cos similarity (!) had the best Recall@10 Colab: drive.google.com/file/d/1joMU... Blog: vishalbakshi.github.io/blog/posts/2...

submitted 90 days ago • 0 comments

Hello bsky! This is my 1st time on this platform so I want to share what I post about, which will mostly be ML but occasionally sports. I'm currently working on 4 projects that I'll detail in this thread: Part 2 of the fastai course, fastbookRAG, TinySentiment, and TypefaceClassifier (1/6)

submitted 90 days ago • 1 comment

me, Claude and ChatGPT all are struggling with regex today

submitted 90 days ago • 0 comments

A blindspot for AI reasoning engines like o1 is that they all appear to be trained on very traditional deductive problem solving for chain of thought What would a model trained on induction or abduction do? What about one trained on free association? Expert heuristics? Randomized exquisite corpse?

submitted 90 days ago • 6 comments

Reading ColBERTv2 paper helped me better understand vanilla ColBERT and now reading PLAID is helping me better understand ColBERTv2: partly bc exposure, partly bc imo the authors describe previous concepts MUCH more clearly in the next paper. I think it's bc maybe they understand it better as well?

submitted 90 days ago • 0 comments

My Bluesky follower count (1.6k followers) has now surpassed my Threads follower count (1.1k). I still see a few AI folks on Threads but it seems so much more dead compared to BlueSky.

submitted 91 days ago • 4 comments

wwyd?

submitted 91 days ago • 1 comment

A concern about bsky is whether it will manage to stay good even as it grows. One of the various issues with Twitter is that a very large fraction of accounts are bots or otherwise fake users (trolls). This is just a direct consequence of scale + lack of moderation.

submitted 91 days ago • 24 comments