chatgtp.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

At this point, the art of detecting which claims and publications are overhyped is a core research skill.

submitted 72 days ago

comment in response to post

It was the first publication I had the chance to work on, back as a MSc student. I was lucky to be mentored by Slavica Dimitrieva, who led the project, and to have worked on it with Eric Durand. Both inspired me to continue on the bio-ML trajectory 🚀

submitted 148 days ago

comment in response to post

The speaker was describing some situation of student misconduct and without any reason or justification mentioned the nationality of the student.

submitted 187 days ago

comment in response to post

4️⃣ “A benchmark for prediction of transcriptomic responses to chemical perturbations across cell types” @chatgtp.bsky.social neurips.cc/virtual/2024...

submitted 192 days ago

comment in response to post

I’m told burnout comes less from having too much to do but rather feeling like what you have to do is out of your control (and/or unpleasant work). So, working out substantial changes in what you’re obligated to be doing is the best way out. Thus making space for something new/interesting!

submitted 194 days ago

comment in response to post

Thanks for compiling. Happy to join the list!

submitted 207 days ago

comment in response to post

It wouldn't have been possible without the Kaggle competitors who contributed their solutions and our collaborators who helped implement them into the platform. 🙏

submitted 216 days ago

comment in response to post

Thanks to a great co-lead Andrew Benz, supervisors Daniel Burkhardt, Malte Luecken, @fabiantheis.bsky.social, help with OP from Robrecht Cannoodt, and everyone involved! @chanzuckerberg.bsky.social and Cellarity for funding to generate data, Kaggle for competition, and SaturnCloud for compute. 🧵8/8

submitted 216 days ago

comment in response to post

Best models predictions are still far from ground truth, but we have anticipated this room for growth, as the platform is a living benchmark, where new methods can easily be integrated into the leaderboard via contributions on GitHub github.com/openproblems... . We're open to suggestions! 🧵7/8

submitted 216 days ago

comment in response to post

We implemented the winning Kaggle competition methods in our Open Problems Perturbation Prediction (OP3) platform. It has a robust eval with baseline methods, and dataset bootstrapping. Simple NNs (with a few caveats) perform best. Also, drugs with larger effects are more difficult to predict. 🧵6/8

submitted 216 days ago

comment in response to post

We used this setup in a Kaggle competition (25k submissions, 1.3k competitors). It sourced models and feedback from competitors, that we used to refine the dataset and benchmark: filtering, cell type annotation, and estimation of perturbation effect. competition: www.kaggle.com/competitions... 🧵5/8

submitted 216 days ago

comment in response to post

Single-cell perturbation readouts have batch effects and a low signal-to-noise ratio. DEG analysis with GLMs and replicates help, but we need to decide on perturbation effect representation - we developed a “cross-donor retrieval” metric for perturbation effect representation evaluation. 🧵4/8

submitted 216 days ago

comment in response to post

We generated a single-cell dataset of 146 drug perturbations in PBMCs of 3 human donors. We used it to benchmark perturbation effect predictions for held-out (cell type, compound) pairs. Perturbation effects are derived from DEG - contrasts treatment vs control in a generalized linear model. 🧵3/8

submitted 216 days ago

comment in response to post

The chemical and biological space of possible perturbations is very large. Thus, methods try to learn from a fraction of possible experiments and infer the rest. However, existing perturbation datasets are limited by size and data quality issues. 🧵2/8

submitted 216 days ago

comment in response to post

Genomics, Evolution, and More @jlsteenwyk.bsky.social bsky.app/starter-pack...

submitted 216 days ago

comment in response to post

I'm a PhD student at Theislab, working on ML applications in omics with a focus on small-molecule perturbation modeling. I'm interested in applications of the above to cancer treatment (FPM).

submitted 218 days ago

comment in response to post

For a list of sc-transformers with descriptions, check out github.com/theislab/sin.... (7/7)

submitted 314 days ago

comment in response to post

While sc transformers are large compared to other sc models, they are tiny compared to LLMs - 650M vs 405B params. One way to leverage other diverse and abundant data is by training&using LLMs on sc tasks. (6/7)

submitted 314 days ago

comment in response to post

The appeal of transformers is generalizing across a variety of tasks&data - we highlight that in independent benchmarks they often lag behind other specialized architectures. Maybe there's not enough diverse data. Maybe we need different data preprocessing or models. (5/7)

submitted 314 days ago

comment in response to post

Non-sequential (tabular) omics data requires preprocessing. There are lots of possibilities. We highlight 3 dominant ways: rank-based (iSEEEK, Geneformer), value-binning (scBERT, scGPT) and value projection (TOSICA, CellPLM). (4/7)

submitted 314 days ago

comment in response to post

Unlike popular in the field autoencoders, transformers take as input a set or a variable-length sequence of embeddings. Transformers rely on attention mechanism and can be trained with MLM or NTP, but neither of these gets us per-cell embeddings. (3/7)

submitted 314 days ago

comment in response to post

For 7 years now, transformers have been taking over more and more fields, from NLP through image and speech processing to protein folding. Is it THE architecture for modeling non-sequential single-cell omics as well? Maybe we just need to make it sequential? (2/7)

submitted 314 days ago

comment in response to post

Even worse - it means that the code is not sufficient to reproduce the scores even if you run it from scratch.

submitted 617 days ago