Data contamination (validation data leaking into training data) is a critical problem for the whole approach of (static) performance benchmarks to test large AI models. This issue needs to be surfaced more. This article for example, completely neglects it. https://www.nytimes.com/2025/01/23/technology/ai-test-humanitys-last-exam.html

Comments