goringennady.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

It is optimistic to say this is common knowledge in the field

submitted 10 days ago

comment in response to post

Regressing on x/y and y? Got it. Anyway, excited to someday see a reanalysis with more recent data and model stability to Belarus 2020

submitted 13 days ago

comment in response to post

Makes perfect sense! But I am not familiar with this language: if the original outcome data do not play into the plot, where do the deviations from the parametric model fit come from?

submitted 13 days ago

comment in response to post

tedious!

submitted 14 days ago

comment in response to post

AI-generated suggestion," "image search that does not find images," and "a week when Google Search refuses to serve the correct results from a particular domain."

submitted 14 days ago

comment in response to post

google has never taken feedback; and the introduction of google lens was reviled by many, many people because chrome replaced the shortcut to google image search, which could find image matches, with something that could not

submitted 14 days ago

comment in response to post

If you are excluding artifact cells, it will still show up as DE, predominantly because the expression is so high that a statistical test will have a lot of evidence to conclude it's DE. There are other, subtler reasons related to distributional assumptions too, I think.

submitted 15 days ago

comment in response to post

Exhaustive, pointed, and unfun critiques of methods are a good thing: they mean the field can settle down from the Wild West into the more esoteric statistical debates ("what's a good model here?" instead of "what's a replicate?"). I don't want a party, I want the right answer at a reasonable cost

submitted 16 days ago

comment in response to post

in the future all scientific computation will rely on the 0x5F3759DF fast invsqrt

submitted 17 days ago

comment in response to post

bsky.app/profile/gori... I wonder which point this is under: the model requires a particular substrate (e.g. perfect-precision data) which cannot be obtained even in principle.

submitted 18 days ago

comment in response to post

perhaps a pithier formulation

submitted 18 days ago

comment in response to post

Re all models are wrong but some are useful: Useful models are designed with a good match between the model architecture and the science question aka ML task. Big gap between the model and what biologists mean with the cell comm questions -> both useless and likely wrong model.

submitted 19 days ago

comment in response to post

🤷‍♀️It is more than a little dubious.

submitted 23 days ago

comment in response to post

miscalibration in Fig. 2; on the other they seem OK with it in Fig. 3 as an induced lfc bound (?) (I think I'm just missing something here). But it certainly seems outré (relative to literature) either way.

submitted 24 days ago

comment in response to post

absolutely, and it is applied on top of a different flavor of double-dipping (using the same genes to cluster and to do differential expression). Now, what the effects of double-dipping through filtering are, I don't know. Probably worse FDR control. On one hand, Bourgon et al. show it leads to

submitted 24 days ago

comment in response to post

p-values that are mostly <1e-100, because the authors downloaded Seurat and made a few changes to a tutorial script to do DEA. The story is, then, between unreliable and meaningless for all but the strongest signal. It's been getting better, so cheerleading for bad stats is more than disturbing.

submitted 24 days ago

comment in response to post

But I'm not a statistician! If a statistician wants to do this leg work, more power to them; maybe they will bring a sea change to how stats is done in genomics. Seems like an easy way to have a lot of impact!

submitted 24 days ago

comment in response to post

I can elaborate on that, too (although this is certainly not exhaustive)

submitted 24 days ago

comment in response to post

Even if there is validation, it is also not great to adamantly and visibly ignore best practices (if the simplest analysis is plain wrong, are the more complex/new ones reliable?), and to ignore all signal other than the strongest visible in naive analysis (again, these are not cheap experiments).

submitted 24 days ago

comment in response to post

It's great when there is time and money to do validation! But more typically there isn't, and correct stats are the difference between the results being potentially interesting and worth following up on, and the results being unusable. These are not cheap experiments.

submitted 24 days ago

comment in response to post

Here is the typical scenario: there is an interesting paper on a unique system or human subjects, with a modest n. The authors did not and will never release data. The biological story is interesting. It would be worth consideration if the analysis were done right. But it is based on a table with

submitted 24 days ago

comment in response to post

3. None of this matters because sc is exploratory and everything should be validated anyway. This is great in theory, but a bit troubling that there is so much will to spend millions of dollars on experiments, seq kits, and GPUs, and none on ensuring the results are reliable on their own.

submitted 24 days ago

comment in response to post

papers reporting issues with pseudoreplication: Squair 2021, yes, but also Zimmerman 2021 and associated correspondence, and Junttila 2022, alongside best practices reviews. It is easy but tedious to come up with adversarial examples where artifacts lead to meaninglessly low p-values on cell basis.

submitted 24 days ago

comment in response to post

2. Treating cells as independent experimental units is OK because everybody does it, and even if we don't, the conclusions still stand (at least for Penk). I do not think pseudobulk is the best possible approach (nor do I think this particular flavor of pseudobulk is great). But there are many

submitted 24 days ago

comment in response to post

Speaking of rank-based methods: this is kind of a funny reference, because (5) says Wilcoxon is okay but risky for marker genes but bad for DGE.

submitted 24 days ago

comment in response to post

Sometimes this nonindependence matters, sometime it doesn't. It seems to be less important for rank-based methods. But those are low-power and do not really take advantage of the data.

submitted 24 days ago

comment in response to post

It does not take a lot of work to confirm (by drawing random variates from the NB) that yes, the DESeq2 filter (or a simpler implementation, throwing out low-expression genes) is independent (preserves the p-value distribution) and the fold change procedure is not. Here be dragons.

submitted 24 days ago

comment in response to post

filtering in e.g. DESeq2. But that approach has a lot of statistical machinery behind it, discussed in great detail in 2010, and independence of the filtering criterion and test statistic seem to be mandatory.

submitted 24 days ago

comment in response to post

This seems like an amazing and novel approach to doing NHST I have never once seen in a paper before. So perhaps the right solution here is to write a paper benchmarking and validating the method instead of baldly insisting it makes sense. Intuitively, it looks superficially similar to independent

submitted 24 days ago

comment in response to post

It's no good

submitted 24 days ago

comment in response to post

which is to say: ideas certainly are cheap! but when the range of implemented products (I do not even ask for usable) is so spectacularly undiverse, there is either (1) ideation failure or (2) implementation failure, across a whole field, for a decade. I suspect (1) but neither is good

submitted 25 days ago

comment in response to post

or for that matter John's peer in famous foursome (0, e.g.; 4)

submitted 26 days ago

comment in response to post

if one prefers so-called "legal" clues: John's peer in famous foursome (0-0-5) (may or may not be unconsciously stolen from a @frisco17.bsky.social Oh No!-meration)

submitted 26 days ago