This is a classic example of _why_ choose-one-of-n datasets need to have large-scale, crowd-sourced statistics and should use the KL-divergence instead of cross-entropy. Reviewers will be more biased than a crowd, it's a high variance+bias estimator, it can harm research. - ThreadSky

About ThreadSky

fernbear.bsky.social • 21 days ago

This is a classic example of _why_ choose-one-of-n datasets need to have large-scale, crowd-sourced statistics and should use the KL-divergence instead of cross-entropy.

Reviewers will be more biased than a crowd, it's a high variance+bias estimator, it can harm research.

Comments

fernbear.bsky.social•21 days ago

Variance can be a problem in testing models, which extends iterative research cycle length due to needing to run more experiments.

One paper that covered this, https://arxiv.org/abs/2103.14749, estimated the CIFAR-10 error rate to be at about .54% or so.

fernbear.bsky.social•21 days ago

When aiming for a 94% accuracy (~6% error rate), this means that that 9% of the remaining labels are "bad", from a cross-entropy perspective.

This is quite a lot! And partially one thing that made testing speedrun results more difficult.

fernbear.bsky.social•21 days ago

Having (a good set of) crowdsourced values for a KL divergence would reduce this variance a bit, and also would give a better value to measure against, due to not being as noisy (in both bias _and_ variance -- a bit of a messy combo to deal with).

fernbear.bsky.social•21 days ago

Having cross-entropy as a default is great, and is really nice for unlabeled data since it all tenda to fall out pretty nicely w.r.t. the learning process, but it is inherently (and necessarily) a much more expensive way to learn a target distribution of values.

Posting Rules

Be respectful to others
No spam or self-promotion
Stay on topic
Follow Bluesky's terms of service

Comments

Posting Rules

Reply