tkorem.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

Congrats!

submitted 71 days ago

comment in response to post

A hopefully paywall-free link: www.nature.com/articles/s41...

submitted 79 days ago

comment in response to post

DEBIAS-M is available as a Python package (korem-lab.github.io/DEBIAS-M/ or just pip install debias-m). It works with any microbiome read count or relative abundance matrices, and any paired metadata. 7/7

submitted 79 days ago

comment in response to post

Its multi-task version allows DEBIAS-M to learn models for multiple tasks at the same time, further increasing its performance. This is particularly useful for tasks such as metabolite level predictions, where we want to predict multiple metabolite levels using the same microbiome data. 6/7

submitted 79 days ago

comment in response to post

Finally, DEBIAS-M is designed for machine learning pipelines, allowing to not just hold-out labels for a test set, but actually has an online learning mode that can handle completely new data on the fly (to our knowledge - the only method that allows that for microbiome data). 5/7

submitted 79 days ago

comment in response to post

Next, the changes DEBIAS-M makes to the data are interpretable and explained by differences in experimental protocols. Analyzing the biases inferred for these 17 gut microbiome studies in HIV, we found that 84% of the variance can be explained by just three experimental factors. 4/7

submitted 79 days ago

comment in response to post

This results in several benefits. First, in diverse benchmarks - using metagenomics and 16S sequencing, vaginal and gut microbiomes, and phenotypic and metabolite predictions - DEBIAS-M outperforms alternative methods. Here is an example for a gut 16S-based HIV classification across 17 studies. 3/7

submitted 79 days ago

comment in response to post

DEBIAS-M is based on the multiplicative bias model of McLaren et al. (elifesciences.org/articles/46923). Under this model, every experimental protocol has different biases for each microbe. We infer the biases that maximize cross-batch association with phenotypes and minimize batch effects. 2/7

submitted 79 days ago

comment in response to post

Congratulations!

submitted 103 days ago

comment in response to post

My next study section was canceled well beyond Feb 2. (20-21)

submitted 143 days ago

comment in response to post

It's really well done

submitted 173 days ago

comment in response to post

A thread by Megan explaining the work bsky.app/profile/mega...

submitted 174 days ago

comment in response to post

Maturity is necessary but not sufficient imo. Take the most successful TT faculty, give them less infrastructure and slash their funding by >50% - they'll be less successful. It's a handicap.

submitted 192 days ago

comment in response to post

Another small point: keep in mind that not having a postdoc advisor is one less person advocating for you, and one less recognizable name on your CV/pedigree (which really has an outsized impact in certain settings). Not that I am a big proponent of postdocs, but that’s for another thread.

submitted 192 days ago

comment in response to post

Even if you did amazing considering your resources, and better than you would've as a postdoc, in many settings (search committees, study sections, etc.) you'd be compared to assistant professors who got bigger start-ups and better access to infrastructure and students.

submitted 192 days ago

comment in response to post

There aren’t a lot of chances for starting your independent group. You want to go as far as you can, as fast as you can, with your very best ideas. In these positions, you often don’t have enough resources to do this, and you're often also not eligible to apply for funding.

submitted 192 days ago

comment in response to post

These positions hire early (often out of PhD), are not tenure track (there's an end date), and provide limited funding (usually enough to hire 2-3 folks). They're appealing: it’s competitive, prestigious, and you get more money and less supervision. But I find it’s often a(n unintentional) trap.

submitted 192 days ago

comment in response to post

Just told my partner yesterday that even if I had another two weeks between Saturday and Sunday I would still be late on a few deadlines come Monday

submitted 272 days ago

comment in response to post

אתה יכול להרחיב?

submitted 321 days ago

comment in response to post

Importantly - we'd love to hear your comments, feedback, and GitHub issues! In particular if there’s additional prior work on this topic that we should note.

submitted 369 days ago

comment in response to post

But CV is used not just for evaluation but also for hyperparameter tuning, and distributional bias impacts HPs that affect regression to the mean. For example, we show that it biases for weaker model regularization, which might affect generalization and downstream deployment.

submitted 369 days ago

comment in response to post

With RebalancedCV we could see the "real-life" impact of distributional bias. We reproduced 3 recently published analyses that used LOOCV, and showed that it under-evaluated performance in all of them. While the effect isn't major, it is consistent.

submitted 369 days ago

comment in response to post

With this in mind, we developed RebalancedCV, an sklearn-compatible package which drops the minimal amount of samples from the training set to maintain the same class balance in the training sets of all folds, thus resolving distributional bias. github.com/korem-lab/Re...

submitted 369 days ago

comment in response to post

As the issue is caused by a shift in the class balance of the training set, distributional bias can be addressed with stratified CV - but only if your dataset allows it to happen precisely. The less exact the stratification - the more bias you have (in this plot, closer to 0).

submitted 369 days ago

comment in response to post

Does this mean that past work with LOOCV is overinflated? Not quite. Most machine learning algorithms regress to the mean - not to its negative - and so they are actually _under_evaluated. That's the negative bias we started with!

submitted 369 days ago

comment in response to post

Distributional bias is a severe information leakage - so severe that we designed a dummy model that can achieve perfect auROC/auPR in ANY binary classification task evaluated via LOOCV (even without features). How? it just outputs the negative mean of the training set labels!

submitted 369 days ago

comment in response to post

The issue is that every time one holds out a sample as a test set in LOOCV, the mean label average of the training set shifts slightly, creating a perfect negative correlation across the folds between that mean and the test labels. We call this phenomenon distributional bias:

submitted 369 days ago

comment in response to post

This story begins with benchmarking we did for some of our machine learning pipelines. We used random data, so we expected to see random classification accuracy (auROC=0.5). Instead, we found a clear negative bias, that got worse with more imbalanced datasets:

submitted 369 days ago

comment in response to post

A bit of background: when training models on small datasets it’s common to use LOOCV, as it maximizes the N of samples for training. It also leaves a single sample for testing, meaning that many performance metrics (e.g., area under ROC curve) require aggregation across folds/iterations.

submitted 369 days ago