Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Hongzhi Huang, Defa Zhu, Banggu Wu, Yutao Zeng, Ya Wang, Qiyang Min, Xun Zhou
tl;dr: increasing input vocabulary is always good, increasing output vocabularies is good for bigger models.
https://arxiv.org/abs/2501.16975
Hongzhi Huang, Defa Zhu, Banggu Wu, Yutao Zeng, Ya Wang, Qiyang Min, Xun Zhou
tl;dr: increasing input vocabulary is always good, increasing output vocabularies is good for bigger models.
https://arxiv.org/abs/2501.16975
1 / 4
Comments