A nice analysis of different tokenization strategies (BPE, wordpiece, sentencepiece) on protein sequences.

https://arxiv.org/abs/2411.17669
1 / 3
Post image
Post image
Post image

Comments