Profile avatar
marcel.bollmann.me
Associate professor at Linköping University 🇸🇪, site development lead for @aclanthology.org, editor-in-chief at @nejlt.bsky.social. Mildly obscure #NLP researcher. I like coffee and board games. 🏠 https://marcel.bollmann.me/
89 posts 1,697 followers 333 following
Regular Contributor
Active Commenter

Excited to be traveling to Estonia for the 1st time to give a keynote @nodalida.bsky.social. I'll talk about using NNs to study language evolution & acquisition. A teaser: It won't be about LLMs 🙃 Also I've just moved from X, so this was my very first post... Pls help out by connecting with me!

Honest question: how do people in NLP deal with the enormous stream of papers 📄 coming in on a daily basis? I would need three times the hours ⏳ to finish my reading list before new papers come in again 📚

Does anyone know why the Llama (et al.) tokenizer has duplicate merges (e.g., "▁bas ically" and "▁basic ally")? The vocabulary has ~32k tokens, but ~64k merges. Not all tokens have duplicate merges that form them, but some have many ("▁render" has 6 and "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁" [all spaces] has 15!).

I am looking for publicly available logs with at least these columns: <Time Stamp> <Some ID> <Event ID -optional> <text message> Ideally logs of medical devices (where <some ID> would be a med device ID), but any multi source / event logs would be appreciated! Pls boost. Thanks!

Things I get unreasonably annoyed by: People who file an issue with the ACL Anthology because their page for "John Doe" contains papers from a different "John Doe" (legit!) but then write "please remove those papers from MY page." Like, what makes this any more YOUR page than the other John Doe’s?

This will be my first time ever not being able to vote in a German parliamentary election, and I'm not fully sure whose fault it is. My municipality sent the documents in time, but in a way that doesn't work with the Swedish system. 🧵/

Aus. Vorbei. Zum ersten Mal in meinem Leben werde ich meine Stimme bei einer Bundestagswahl nicht abgeben können. Tausenden Auslandsdeutschen geht es genauso. Warum ich stocksauer bin. 👇👇👇 www.rnd.de/politik/bund...

Coincidentally, I just realized I have broken my NYT Monday crossword streak because I wasn’t feeling well yesterday. Will I be motivated to pick it up again, or discouraged by the silly streak concept? We shall see next Monday 🙃

I'm starting to dislike that everything has "streaks". Duolingo, puzzle websites, the photography challenge I recently joined, ... I notice it makes me binge, get burnt out, and then stop completely, since what's the point of continuing once you've broken your streak?

if i ask you “How many bones are in your hand?” and you answer “27”, that’s because at some point you read that fact, not because you can introspect and sense how many bones you have. Asking an LLM about its training data, or the probability it assigns words is just like that.

⏰ Deadline extension: You can still apply for our postdoc position until 2025-02-12! That’s this Wednesday. Consider joining us in Sweden! 🤗 #nlp #nlproc

Ok all I want from @bsky.app is an algorithmic feed that allows me to explicitly filter out US political news. Science communities are only going to tolerate feeds dominated by funding outrages for so long (or we will keep doomscrolling forever)

We have a two year full-time position suitable for a postdoc with experience in NLP and a passion for mental health applications in an exciting collaborative project on analysing psychotherapy with NLP methods. Is this for you? Knowledge of Danish required. candidate.hr-manager.net/ApplicationI...

Sending out failing grades and getting emails from understandably disappointed students is probably my least favorite part of teaching

Hey Swedish, why is it “storleken på” and not “storleken av”, wtf

Grateful for my topic modeling and word embeddings training, which made me suspicious of any output that "looks good" but for which I haven't seen any alternative outputs that might also "look good." Running a prompt and getting output that looks good isn't sufficient evidence for a paper.

the best thing about "AI" is that it should never work at all and yet it works sometimes. but because i'm constantly being told that it works perfectly, my reaction when it works only occasionally is "damn, this thing sucks" instead of "holy shit this is incredible"

There's a petition to ban conversion practices (targeting LGBTQIA+ people) in the European Union. We need one million signatures, and we currently have 180,000+. If you live in Europe (even if you don't), spread the word ! 🌈🔥✊ eci.ec.europa.eu/043/public/#...

I want to create a feed that is just my main following feed but with some terms filtered out. I don't want to use mute lists, because I still want to be able to access the unfiltered feed easily (i.e., through the feed list at the top of my timeline). Is there a way to do this easily? #bsky

With Google removing their Responsible AI Principles, they no longer state that they will *not* engage in "Technologies whose purpose contravenes widely accepted principles of international law and human rights". Concerns about surveillance and injury are also erased. ai.google/responsibili...

In terms of doomscrolling potential, my Bsky feed is now just as terrible as what Twitter used to be. I guess you just can’t escape the current state of the world.

One week left to apply for a postdoc with us! ⏳⤵️

When I grew up in Sweden in the 1990s the government distributed the book “Tell Ye Your Children”. Its purpose is to remind people how fascism, and seeing oneself as better than others can lead to inhuman evil. I think about it often and recommend you read it too.