This is a classic example of _why_ choose-one-of-n datasets need to have large-scale, crowd-sourced statistics and should use the KL-divergence instead of cross-entropy.
Reviewers will be more biased than a crowd, it's a high variance+bias estimator, it can harm research.
Reviewers will be more biased than a crowd, it's a high variance+bias estimator, it can harm research.
Comments
One paper that covered this, https://arxiv.org/abs/2103.14749, estimated the CIFAR-10 error rate to be at about .54% or so.
This is quite a lot! And partially one thing that made testing speedrun results more difficult.