xhluca.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

xhluca.bsky.social

👨‍🍳 Web Agents @mila-quebec.bsky.social 🎒 @mcgill-nlp.bsky.social

41 posts 646 followers 151 following

Posts 22 Comments 28

Check out the new MMTEB benchmark🙌 if you are looking for an extensive, reproducible and open-source evaluation of text embedders!

submitted 7 days ago • 0 comments

I'm fortunate to have collaborated with a team of brilliant researchers on this colossal project 🎊 Among the tasks i contributed, i'm most excited about the contextual web element retrieval task derived from weblinx, which i think is a crucial component for building web agents!

submitted 5 days ago • 0 comments

Presenting ✨ 𝐂𝐇𝐀𝐒𝐄: 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐢𝐧𝐠 𝐬𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐝𝐚𝐭𝐚 𝐟𝐨𝐫 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 ✨ Work w/ fantastic advisors Dima Bahdanau and @sivareddyg.bsky.social Thread 🧵:

submitted 6 days ago • 1 comment

I am delighted to announce that we have released 🎊 MMTEB 🎊, a large-scale collaboration working on efficient multilingual evaluation of embedding models. This work implements >500 evaluation tasks across >1000 languages and covers a wide range of use cases and domains🩺👩‍💻⚖️

submitted 7 days ago • 1 comment

Interested in knowing more about LLMs agents and in contributing to this topic?🚀 📢We're thrilled to announce REALM: The first Workshop for Research on Agent Language Models 🤖 #ACL2025NLP in Vienna 🎻 We have an exciting lineup of speakers 🗓️ Submit your work by March 1st @aclmeeting.bsky.social

submitted 35 days ago • 1 comment

Glad to see BM25S (bm25s.github.io) has been downloaded 1M times on PyPi 🎉 Numbers aside, it makes me happy to hear the positive experience from friends working on retrieval. It's good to know that people near me are enjoying it! Discussion: github.com/xhluca/bm25s/discussions

submitted 42 days ago • 1 comment

Retrieval seems to be a rather challenging problem even in the era of LLMs: a lot of benchmarks do not seem to be saturated yet, e.g. the best score on a 7-year old benchmark like Dbpedia is around 0.53 NDCG@10. I wonder if it's a lack of focus or if they are truly challenging problems to solve...

submitted 65 days ago • 1 comment

I'll get straight to the point. We trained 2 new models. Like BERT, but modern. ModernBERT. Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff. It's much faster, more accurate, longer context, and more useful. 🧵

submitted 70 days ago • 19 comments

Really glad that this work is out! Agentlab and browsergym will be, in my opinion, very important components of web agent research and will play an important role in the toolkit of most web agent researchers. Read the paper if you are interested in learning more about what the platform covers!

submitted 76 days ago • 0 comments

Glad to be part of this great collaborative effort 😊

submitted 77 days ago • 0 comments

We’re really excited to release this large collaborative work for unifying web agent benchmarks under the same roof. In this TMLR paper, we dive in-depth into #BrowserGym and #AgentLab. We also present some unexpected performances from Claude 3.5-Sonnet

submitted 77 days ago • 1 comment

Finally it's handy that all my twitter posts got migrated here to bsky: I'll be presenting AURORA at @neuripsconf.bsky.social on Wednesday! Come by to discuss text-guided editing (and why imo it is more interesting than image generation), world modeling, evals and vision-and-language reasoning

submitted 81 days ago • 1 comment

Tomorrow at 3:15pm I'll be presenting my work at @mila-quebec.bsky.social's booth (#104) at @neuripsconf.bsky.social. Come to learn more about controlling multimodal LLMs via reward-guided decoding! 🔗 openreview.net/forum?id=VWJ...

submitted 80 days ago • 0 comments

Awesome Starter Pack. Thanks @xhluca.bsky.social

submitted 84 days ago • 0 comments

I've created a starter pack of researchers working on digital agents (focusing on web, mobile and OS agents). I am missing a lot, and many are not on bsky yet, so if I missed you or someone you know, please send me a DM with the link to a relevant paper and I will update the starter pack!

submitted 84 days ago • 1 comment

🧵-1 We are thrilled to release #AgentLab, a new open-source package for developing and evaluating web agents. This builds on the new #BrowserGym package which supports 10 different benchmarks, including #WebArena.

submitted 86 days ago • 2 comments

A dataset of 1 million or 2 million Bluesky posts is completely irrelevant to training large language models. The primary usecase for the datasets that people are losing their shit over isn't ChatGPT, it's social science research and developing systems that improve Bluesky.

submitted 91 days ago • 8 comments

I'm disheartened by how toxic and violent some responses were here. There was a mistake, a quick follow up to mitigate and an apology. I worked with Daniel for years and is one of the persons most preoccupied with ethical implications of AI. Some replies are Reddit-toxic level. We need empathy.

submitted 92 days ago • 30 comments

Small yet mighty! 💫 We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🤠 We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base huggingface.co/collections/...

submitted 93 days ago • 11 comments

Mila is such a large community. One starter pack just isn’t enough! After @josephdviviano.bsky.social’s Mila list filled up, I decided to make another one. Will continue to add members until this one is full too. go.bsky.app/9nXTDHo

submitted 92 days ago • 4 comments

Excited to share OLMo 2! 🐟 7B and 13B weights, trained up to 4-5T tokens, fully open data, code, etc 🐠 better architecture and recipe for training stability 🐡 staged training, with new data mix Dolmino🍕 added during annealing 🦈 state-of-the-art OLMo 2 Instruct models #nlp #mlsky links below👇

submitted 93 days ago • 1 comment

It turns out we had even more papers at EMNLP! Let's complete the list with three more🧵

submitted 96 days ago • 1 comment