This is a classic example of _why_ choose-one-of-n datasets need to have large-scale, crowd-sourced statistics and should use the KL-divergence instead of cross-entropy.

Reviewers will be more biased than a crowd, it's a high variance+bias estimator, it can harm research.
Post image

Comments