This paper masks out principal components instead of RGB patches because
(1) visible pixels may be redundant with masked ones,
(2) visible pixels may not be predictive of masked regions.

+38% on classification tasks.

I wonder how much CroCo & *ST3R might benefit from this.
https://arxiv.org/abs/2502.06314
1 / 2
Post image
Post image

Comments