I am often invited to review papers on deep learning for medical images. Unfortunately many papers do the same mistake; they split data into training/validation/test on the slice/image/patch level instead of on the patient level. This will lead to inflated test scores, as images from the same
Comments
https://link.springer.com/chapter/10.1007/978-3-030-68763-2_13
https://www.nature.com/articles/s41597-022-01618-6
If you use deep learning for digital pathology, the patches have to be split on the patient level, not on patch level.
If you use 2D networks on 3D data, the 2D slices have to be split on the patient level, not on slice level.