paulfharrison.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

... BlueSky seems to have thrown out most of the detail in my second video, so here's the original: logarithmic.net/pfh-files/ra...

submitted 5 days ago

comment in response to post

Also the optimizer has not found the lowest energy state, which would be all-dark or all-light. It might take a very long time to reach one of these optima!

submitted 5 days ago

comment in response to post

The optimizer tries to find the lowest energy. This closely resembles "maximum likelihood" or "maximum a posteriori" estimation in statistics. We might hope this finds the most representative estimate. Clearly, here it does not! The estimate is smoother than most samples from the distribution.

submitted 5 days ago

comment in response to post

Second, sampling from the distribution with a Langevin Dynamics simulation. The algorithm is almost identical to gradient descent with momentum, but we add just the right amount of noise to the momentum at each step.

submitted 5 days ago

comment in response to post

Also, each "with_" function has a corresponding "local_" function that stacks into your current environment (a function body or local({ ... })). This is syntactic sugar to flatten nested resource usage, much like the pipe %>% flattens nested function calls. Makes coding easier for humans.

submitted 15 days ago

comment in response to post

Some finicky details. Each time I re-read R. M. Neal's chapter here I pick up more details. It helps if the Hamiltonian step is symplectic and time-reversible! Also I'm not doing Metropolis-Hastings proposal rejection, which is what makes my method approximate. www.mcmchandbook.net/HandbookChap...

submitted 17 days ago

comment in response to post

Right now I mainly need to find a fun application. Langevin Tours with langevitour are fun but only use tens of parameters. This should be applicable to millions of parameters.

submitted 17 days ago

comment in response to post

Also interesting to compare causal DAG based reasoning to Hill's criteria. As in it is quite hard to compare them, and I think Hill's criteria are a fair description of a lot of important decision making, so maybe there is a wider context. pmc.ncbi.nlm.nih.gov/articles/PMC...

submitted 55 days ago

comment in response to post

Also there's an interesting package called Ckmeans.1d.dp that calculates the exactly optimal k-means clustering in 1D. cran.r-project.org/web/packages...

submitted 63 days ago

comment in response to post

The optimum k-means clustering has nice properties for quantizing a large cloud of points, with coverage of lower density regions. However getting near the optimum is very dependent on the initialization. There's a variant called k-means++ with better initialization. logarithmic.net/pfh-files/bl...

submitted 63 days ago

comment in response to post

Some reflections on the series here. logarithmic.net/pfh/blog/017...

submitted 64 days ago

comment in response to post

10/ TL;DR: ✔️ Under the null, p-values are uniform (flat distribution). ✔️ A "spike near 0" suggests true effects. ✔️ A "U-shape" signals artifacts—time to troubleshoot. Plot your p-values. They’ll tell you more than you think! Have questions? Let’s chat 👇

submitted 65 days ago

comment in response to post

This is mostly working with one particular lab. At least one person has reproduced it with their own R code. They actually added the red and blue coloring, which I thought was redundant but does add a lot of visual impact. I am going to need to add it to the new version.

submitted 83 days ago

comment in response to post

The lesson I've learned is to be very careful writing code that produces plots. If there's an easy function, or some code to copy from a vignette, people tend to use it without much thought.

submitted 84 days ago

comment in response to post

Here's a new version of the plot. There is one point per gene! The y axis shows the estimated log fold change, and the color tells about the confidence bound. I lose a little resolution by using color, but hopefully gain understandability. I am hoping it is less confusing and more conventional.

submitted 84 days ago

comment in response to post

Here's the plot. It's looking at differential gene expression. There are two type of points. They gray dots show estimated log fold change on the y-axis. The colored points show a confidence bound on the log fold change on the y-axis. A significant gene is represented by two different points!

submitted 84 days ago

comment in response to post

This was one of the important ideas:

submitted 92 days ago

comment in response to post

Great 'cos I just started this one. go.bsky.app/NmhwfbN

submitted 101 days ago

comment in response to post

So I'm thinking it might be possible to take a policy like BH which is maybe not quite right in a particular setup and *recalibrate* it based on simulation or resampling. For example a Tukey all-pairs comparison version of FDR, where the test statistics aren't independent. Or gene-set enrichment.

submitted 101 days ago

comment in response to post

Prompted by a case where the slope really should have been 1. geom_smooth made it look less, and even made it seem like the data should be broken into groups. ⟋ ⟋ ⟋

submitted 104 days ago

comment in response to post

The link in 5 is broken for me. I think it's changed to academic.oup.com/clinchem/art...

submitted 122 days ago

comment in response to post

Adding some speculation: Roughly, GLMs are unbiassed on a linear scale even when the model is on a different scale such as log. This perhaps isn't ideal. Maybe it is better to be unbiassed on a log(ish) scale, like the voom-limma method. This somewhat protects you from large positive outliers.

submitted 148 days ago

comment in response to post

Um. Does the Wilcoxon rank-sum test not handle ties well? I thought there was a correction for ties. Context: We use this test a lot with scRNA-Seq data, which has a lot of zeros.

submitted 148 days ago

comment in response to post

Congratulations to everyone involved -- Adele Barugahare, Nitika Kandhari, Scott Coutts, Andrew Perry, and especially Laura Perlaza-Jimenez for actually making it happen. monashbioinformaticsplatform.github.io/RNAseq_works...

submitted 148 days ago

comment in response to post

This had been an idea for quite a while, and I'm really pleased how it turned out. When we do command-line or R workshops, we spend a lot of time on the how. With web tools we could really dig into the meaning behind the plots. Scott was also able to give detailed experience on library preparation.

submitted 148 days ago