Profile avatar
vishal-learner.bsky.social
Machine Learning. https://fast.ai community member. Will post about sports occasionally. #FlyEaglesFly https://www.youtube.com/@vishal_learner
37 posts 45 followers 62 following
Prolific Poster
Conversation Starter

I made a video on the Hypencoder paper (arxiv.org/abs/2502.05364) in which they train a hypernetwork which takes in query token representations and outputs a neural net which takes in a single-vector document representation and outputs a relevance score! www.youtube.com/watch?v=-xWB...

One of the questions we debated while training ModernBERT was whether a modern trained encoder would unlock zero-shot reasoning using only it's generative head? Spoilers: the answer is yes.

I made a video (30 mins) walking through the code and concepts of the peft library's DoRA implementation. I was inspired to dig into this after reading/running Raschka's DoRA and LoRA from scratch magazine article! youtu.be/GE6jRudHhzY?...

I made a video on the rsLoRA paper, in which we learn that low ranks are not "sufficient" but rather, a particular scaling factor (alpha/sqrt(r)) is needed to stabilize training and unlock increased performance for high LoRA ranks. πŸ“ˆπŸ“ˆπŸ“ˆ www.youtube.com/watch?v=TVfd...

For my first paper summary/presentation video (36 min), I walk through the "LoRA Learns Less and Forgets Less" paper by Biderman et al. Here are the main takeaways and their recommendations for using LoRA. Planning to do these regularly! πŸ“„πŸ”πŸ§  www.youtube.com/watch?v=0p6H...

Wrote up my notes on ModernBERT, the brand new modern alternative to 2018-era BERT released by @benjaminwarner.dev and @howard.fm and team simonwillison.net/2024/Dec/24/...

Wrote up my notes on ModernBERT, the brand new modern alternative to 2018-era BERT released by https://www.answer.ai/ https://simonwillison.net/2024/Dec/24/modernbert/

This week we released ModernBERT, the first encoder to reach SOTA on most common benchmarks across language understanding, retrieval, and code, while running twice as fast as DeBERTaV3 on short context and three times faster than NomicBERT & GTE on long context.

Published a blog post version of my PLAID scoring pipeline code walkthrough with explanatory/narrative text! vishalbakshi.github.io/blog/posts/2...

New LLM Eval Office Hours, I discuss the importance of doing error analysis before jumping into metrics and tests Links to notes in the YT description youtu.be/ZEvXvyY17Ys?...

Very excited to share my step-by-step walkthrough of the colbert repo code needed to recreate the 4-stage PLAID scoring pipeline to match RAGatouille results! Amazing to see centroids, scores, passage token encodings & passage IDs come to life!! youtu.be/XRPP5LHHk0o (1/2)

I'll get straight to the point. We trained 2 new models. Like BERT, but modern. ModernBERT. Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff. It's much faster, more accurate, longer context, and more useful. 🧡

I published a 6-video series on Information Retrieval fundamentals! 1. fastbook-benchmark Overview 2. Document Processing 3. Full Text Search 4. Scoring Retrieval Results 5. Single Vector Search 6. ColBERT search Code: github.com/vishalbakshi... YT Playlist: www.youtube.com/watch?v=VsVI...

Excited to launch fastbook-benchmark: a dataset of 191 multi-component QA pairs for evaluating IR on complex educational/technical content! GitHub: github.com/vishalbakshi... Video walkthrough: www.youtube.com/watch?v=VsVI... Video walkthroughs of retrieval experiments coming soon! πŸš€

Just posted my second video! Implementing image-to-image generation with stable diffusion: youtu.be/POisZHNP23c got a different headset (Audio-Technica ATH-M50xSTS-USB) and I find the audio quality and ease of use much improved from the XLR broadcasting headset I used for my first vid LET'S GO πŸš€

I'm starting to record ML videos! I'll be focusing on what I'm learning in part 2 of the fastai course, research papers I'm reading, and projects I'm working on like fastbookRAG. Kicking it off with coding through negative prompting in stable diffusion. LET'S GO!! πŸš€πŸš€πŸš€ www.youtube.com/watch?v=_nzR...

Gonna try this out with the ColBERT repo this week. Exciting!!!

Chat with any open source repo easily. Gitingest (free online tool) turns any GitHub repository into a single markdown file for pasting. Claude artifacts makes this 300k token output pretty easy to work with.

How does the script work? - It connects to BluesSky API with your username/pwd - You can pass a handle and it will retrieve the las 72hours replies - It will iterate the replies and run OpenAI moderation APi on each one (you can replace OpenAI by the moderation filter of your liking)

A dataset of 1 million or 2 million Bluesky posts is completely irrelevant to training large language models. The primary usecase for the datasets that people are losing their shit over isn't ChatGPT, it's social science research and developing systems that improve Bluesky.

i am surprised that only 23 posts in i'm already blocked by 3 people!

An AI researcher that wants to stop big tech owning everything was permabanned here for releasing a dataset of 2M posts. A librarian received death threats for a 1M post dataset. The EU funded the creation of this dataset of 235M posts months ago, and… nothing? zenodo.org/records/1108...

I wrote a thing about "Storing time for human events" - how if you're building an events website used by actual human beings the standard advice of "convert times to UTC and just store that" isn't actually the best approach simonwillison.net/2024/Nov/27/...

It's pretty sad to see the negative sentiment towards Hugging Face on this platform due to a dataset put by one of the employees. I want to write a small piece. 🧡 Hugging Face empowers everyone to use AI to create value and is against monopolization of AI it's a hosting platform above all.

testing if I can see the GIF animation after uploading and posting it

eval'd 24 retrieval method/chunking combos w/ fastbook-benchmark in an order of mag less time than manual evals. answerai-colbert-small-v1 had the best MRR@10, single-vec cos similarity (!) had the best Recall@10 Colab: drive.google.com/file/d/1joMU... Blog: vishalbakshi.github.io/blog/posts/2...

Hello bsky! This is my 1st time on this platform so I want to share what I post about, which will mostly be ML but occasionally sports. I'm currently working on 4 projects that I'll detail in this thread: Part 2 of the fastai course, fastbookRAG, TinySentiment, and TypefaceClassifier (1/6)

me, Claude and ChatGPT all are struggling with regex today

A blindspot for AI reasoning engines like o1 is that they all appear to be trained on very traditional deductive problem solving for chain of thought What would a model trained on induction or abduction do? What about one trained on free association? Expert heuristics? Randomized exquisite corpse?

Reading ColBERTv2 paper helped me better understand vanilla ColBERT and now reading PLAID is helping me better understand ColBERTv2: partly bc exposure, partly bc imo the authors describe previous concepts MUCH more clearly in the next paper. I think it's bc maybe they understand it better as well?

My Bluesky follower count (1.6k followers) has now surpassed my Threads follower count (1.1k). I still see a few AI folks on Threads but it seems so much more dead compared to BlueSky.

wwyd?

A concern about bsky is whether it will manage to stay good even as it grows. One of the various issues with Twitter is that a very large fraction of accounts are bots or otherwise fake users (trolls). This is just a direct consequence of scale + lack of moderation.