This #EMNLP2024 best paper https://aclanthology.org/2024.emnlp-main.300/ had large gains over their (somewhat weak) baseline in trying to determine if a given document was in a LLMs pre-training data. Progress in an important problem.
Comments
Log in with your Bluesky account to leave a comment
Comments