Looking at TabPFN with Polaris datasets macinchem.org/2025/02/06/l... #cheminformatics - ThreadSky

macinchem.bsky.social • 22 days ago

Looking at TabPFN with Polaris datasets https://macinchem.org/2025/02/06/looking-at-tabpfn/ #cheminformatics

Comments

Interesting! Would be cool to have these datasets on OpenML as well so they are easy to use in tabular benchmarks.
Here are some more recommendations for stronger tabular baselines:
1. For CatBoost and XGBoost, you'd want at least early stopping to select the best iteration.

dholzmueller.bsky.social•21 days ago

Using my library https://github.com/dholzmueller/pytabkit
you could, for example, use CatBoost_TD_Regressor(n_cv=5), which will use better default parameters for regression, train five models in a cross-validation setup, select the best iteration for each, and ensemble them.

dholzmueller.bsky.social•21 days ago

The library offers the same for XGBoost and LightGBM. Plus, the library includes some of the best tabular DL models like RealTabR, TabR, RealMLP, and TabM that could also be interesting to try. (ModernNCA is also very good but not included.)

dholzmueller.bsky.social•21 days ago

Finally, if you just want to have the best performance for a given (large) time budget, AutoGluon combines many tabular models. It does not include some of the latest models (yet), but has a very good CatBoost, for example, and will likely outperform individual models.

caswognum.nl•21 days ago

Interesting!

This would be a nice “baseline” to have for the associated Polaris benchmark. https://polarishub.io/benchmarks/biogen/adme-fang-reg-v1.

macinchem.bsky.social•21 days ago

Happy to contribute, what do I need to do?

caswognum.nl•21 days ago

@polarishub.io has a benchmarking API which ensures you use the exact same split and metrics as everyone else. See the screenshot.

We're aiming to serve as a source of truth for machine learning in drug discovery. For context, see: https://polarishub.io/blog/reproducible-machine-learning-in-drug-discovery-how-polaris-serves-as-a-single-source-of-truth

Would love to hear your feedback!

Comments

Posting Rules

Reply