Profile avatar
xhluca.bsky.social
๐Ÿ‘จโ€๐Ÿณ Web Agents @mila-quebec.bsky.social ๐ŸŽ’ @mcgill-nlp.bsky.social
41 posts 646 followers 151 following
Regular Contributor
Active Commenter

Check out the new MMTEB benchmark๐Ÿ™Œ if you are looking for an extensive, reproducible and open-source evaluation of text embedders!

I'm fortunate to have collaborated with a team of brilliant researchers on this colossal project ๐ŸŽŠ Among the tasks i contributed, i'm most excited about the contextual web element retrieval task derived from weblinx, which i think is a crucial component for building web agents!

Presenting โœจ ๐‚๐‡๐€๐’๐„: ๐†๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐ง๐  ๐œ๐ก๐š๐ฅ๐ฅ๐ž๐ง๐ ๐ข๐ง๐  ๐ฌ๐ฒ๐ง๐ญ๐ก๐ž๐ญ๐ข๐œ ๐๐š๐ญ๐š ๐Ÿ๐จ๐ซ ๐ž๐ฏ๐š๐ฅ๐ฎ๐š๐ญ๐ข๐จ๐ง โœจ Work w/ fantastic advisors Dima Bahdanau and @sivareddyg.bsky.social Thread ๐Ÿงต:

I am delighted to announce that we have released ๐ŸŽŠ MMTEB ๐ŸŽŠ, a large-scale collaboration working on efficient multilingual evaluation of embedding models. This work implements >500 evaluation tasks across >1000 languages and covers a wide range of use cases and domains๐Ÿฉบ๐Ÿ‘ฉโ€๐Ÿ’ปโš–๏ธ

Interested in knowing more about LLMs agents and in contributing to this topic?๐Ÿš€ ๐Ÿ“ขWe're thrilled to announce REALM: The first Workshop for Research on Agent Language Models ๐Ÿค– #ACL2025NLP in Vienna ๐ŸŽป We have an exciting lineup of speakers ๐Ÿ—“๏ธ Submit your work by *March 1st* @aclmeeting.bsky.social

Glad to see BM25S (bm25s.github.io) has been downloaded 1M times on PyPi ๐ŸŽ‰ Numbers aside, it makes me happy to hear the positive experience from friends working on retrieval. It's good to know that people near me are enjoying it! Discussion: github.com/xhluca/bm25s/discussions

Retrieval seems to be a rather challenging problem even in the era of LLMs: a lot of benchmarks do not seem to be saturated yet, e.g. the best score on a 7-year old benchmark like Dbpedia is around 0.53 NDCG@10. I wonder if it's a lack of focus or if they are truly challenging problems to solve...

I'll get straight to the point. We trained 2 new models. Like BERT, but modern. ModernBERT. Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff. It's much faster, more accurate, longer context, and more useful. ๐Ÿงต

Really glad that this work is out! Agentlab and browsergym will be, in my opinion, very important components of web agent research and will play an important role in the toolkit of most web agent researchers. Read the paper if you are interested in learning more about what the platform covers!

Glad to be part of this great collaborative effort ๐Ÿ˜Š

Weโ€™re really excited to release this large collaborative work for unifying web agent benchmarks under the same roof. In this TMLR paper, we dive in-depth into #BrowserGym and #AgentLab. We also present some unexpected performances from Claude 3.5-Sonnet

Finally it's handy that all my twitter posts got migrated here to bsky: I'll be presenting AURORA at @neuripsconf.bsky.social on Wednesday! Come by to discuss text-guided editing (and why imo it is more interesting than image generation), world modeling, evals and vision-and-language reasoning

Tomorrow at 3:15pm I'll be presenting my work at @mila-quebec.bsky.social's booth (#104) at @neuripsconf.bsky.social. Come to learn more about controlling multimodal LLMs via reward-guided decoding! ๐Ÿ”— openreview.net/forum?id=VWJ...

Awesome Starter Pack. Thanks @xhluca.bsky.social

I've created a starter pack of researchers working on digital agents (focusing on web, mobile and OS agents). I am missing a lot, and many are not on bsky yet, so if I missed you or someone you know, please send me a DM with the link to a relevant paper and I will update the starter pack!

๐Ÿงต-1 We are thrilled to release #AgentLab, a new open-source package for developing and evaluating web agents. This builds on the new #BrowserGym package which supports 10 different benchmarks, including #WebArena.

A dataset of 1 million or 2 million Bluesky posts is completely irrelevant to training large language models. The primary usecase for the datasets that people are losing their shit over isn't ChatGPT, it's social science research and developing systems that improve Bluesky.

I'm disheartened by how toxic and violent some responses were here. There was a mistake, a quick follow up to mitigate and an apology. I worked with Daniel for years and is one of the persons most preoccupied with ethical implications of AI. Some replies are Reddit-toxic level. We need empathy.

Small yet mighty! ๐Ÿ’ซ We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient ๐Ÿค  We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base huggingface.co/collections/...

Mila is such a large community. One starter pack just isnโ€™t enough! After @josephdviviano.bsky.socialโ€™s Mila list filled up, I decided to make another one. Will continue to add members until this one is full too. go.bsky.app/9nXTDHo

Excited to share OLMo 2! ๐ŸŸ 7B and 13B weights, trained up to 4-5T tokens, fully open data, code, etc ๐Ÿ  better architecture and recipe for training stability ๐Ÿก staged training, with new data mix Dolmino๐Ÿ• added during annealing ๐Ÿฆˆ state-of-the-art OLMo 2 Instruct models #nlp #mlsky links below๐Ÿ‘‡

It turns out we had even more papers at EMNLP! Let's complete the list with three more๐Ÿงต