Yeah I’m with you. People who believe this should be publishing the secret gossip and helping clear up the field. It’s valuable work - ThreadSky

eugenevinitsky.bsky.social • 103 days ago

Yeah I’m with you. People who believe this should be publishing the secret gossip and helping clear up the field. It’s valuable work

Comments

neurostats.org•89 days ago

Yes if that folk knowledge is invisible, other fields misinterpret what they learn from ML publications.

cathywu.bsky.social•103 days ago

I collected some folk knowledge for RL and stuck them in my lecture slides a couple weeks back: https://web.mit.edu/6.7920/www/lectures/L18-2024fa-Evaluation.pdf#page=55 See Appendix B... sorry, I know, appendix of a lecture slide deck is not the best for discovery. Suggestions very welcome.

neurostats.org•89 days ago

📌

theeimer.bsky.social•103 days ago

This is awesome, thanks! 🙏 Forwarding to my students immediately!

I have a small note due which is a pet peeve of mine: when tuning hyperparameters, make sure to tune and and report different seeds! I think especially newbies might miss that, but that can make up to a factor of 8 as far I've seen

cvoelcker.bsky.social•102 days ago

I had a fight with a senior PhD student once who tuned a hyperparameter to the 10 digit after the point, but on a single seed. It was pretty funny.

cathywu.bsky.social•102 days ago

Oh - can you elaborate? Average across different seeds while tuning? Or something else.

theeimer.bsky.social•97 days ago

That can also help! My point is more about the fact that by tuning, we're inducing an optimization bias (even with grid search, I'd say), so usually your performance will look much better on the exact setting you tune on.

theeimer.bsky.social•97 days ago

The problem is that just like any other optimization, generalization to other settings is then limited and not necessarily predictable, potentially leading to much better or worse performance. That's the difference between similarly colored bars in this plot.

theeimer.bsky.social•97 days ago

So basically reporting the direct outcome of tuning is like reporting training performance only, it's better practice in the AutoML community to use a validation setting (e.g. fresh seeds) instead to get a more realistic image of the algorithm's performance.

Comments

Posting Rules

Reply