#EMNLP has a nice set of tokenization/subword modeling papers this year.

It's a good mix of tokenization algorithms, tokenization evaluation, tokenization-free methods, and subword embedding probing. Lmk if I missed some!

Here is a list with links + presentation time (in chronological order).

Comments