Some caveats: All DL models are trained with a batch size of 1024, while we recommend using 256 for RealMLP on medium-sized datasets. Other choices (selection of datasets, not using bagging, choice of metrics, search spaces for baselines) can of course also influence results. 2/
Comments
et al. This highlights the benchmarking problems in the field (and potentially the difficulty in using many of these models correctly). The situation is slowly improving. 4/
RealMLP: https://github.com/dholzmueller/pytabkit
5/
https://bsky.app/profile/dholzmueller.bsky.social/post/3lba4alreok23