There's a known bug in how we compute "word" probabilities with subword-based LMs that mark beginnings of words -- as pointed out by Byung-doh Oh and Will Schuler, & @tpimentel.bsky.social and Clara Meister
I'm pleased to announce that minicons now includes a fix which runs batch-wise!
I'm pleased to announce that minicons now includes a fix which runs batch-wise!
1 / 2
Comments
Oh and Schuler: https://aclanthology.org/2024.emnlp-main.202/
Pimentel and Meister (i only implemented the bow fix, and not the other ones yet...): https://aclanthology.org/2024.emnlp-main.1020/
Fix brought to you by post-emnlp blues and sudden urge to write code!
Code in screenshot available here:https://github.com/kanishkamisra/minicons/blob/master/examples/bow-correction.py