paulfharrison.bsky.social
Bioinformatician at Monash University, Melbourne, Australia.
I also use mastodon: @[email protected]
https://mastodon.online/@pfh
My homepage is:
https://logarithmic.net/pfh/
On Twitter I was: @paulfharrison
51 posts
374 followers
114 following
Regular Contributor
Active Commenter
comment in response to
post
... BlueSky seems to have thrown out most of the detail in my second video, so here's the original:
logarithmic.net/pfh-files/ra...
comment in response to
post
Also the optimizer has not found the lowest energy state, which would be all-dark or all-light. It might take a very long time to reach one of these optima!
comment in response to
post
The optimizer tries to find the lowest energy. This closely resembles "maximum likelihood" or "maximum a posteriori" estimation in statistics. We might hope this finds the most representative estimate. Clearly, here it does not! The estimate is smoother than most samples from the distribution.
comment in response to
post
Second, sampling from the distribution with a Langevin Dynamics simulation. The algorithm is almost identical to gradient descent with momentum, but we add just the right amount of noise to the momentum at each step.
comment in response to
post
Also, each "with_" function has a corresponding "local_" function that stacks into your current environment (a function body or local({ ... })).
This is syntactic sugar to flatten nested resource usage, much like the pipe %>% flattens nested function calls. Makes coding easier for humans.
comment in response to
post
Some finicky details. Each time I re-read R. M. Neal's chapter here I pick up more details. It helps if the Hamiltonian step is symplectic and time-reversible! Also I'm not doing Metropolis-Hastings proposal rejection, which is what makes my method approximate.
www.mcmchandbook.net/HandbookChap...
comment in response to
post
Right now I mainly need to find a fun application. Langevin Tours with langevitour are fun but only use tens of parameters. This should be applicable to millions of parameters.
comment in response to
post
Also interesting to compare causal DAG based reasoning to Hill's criteria. As in it is quite hard to compare them, and I think Hill's criteria are a fair description of a lot of important decision making, so maybe there is a wider context.
pmc.ncbi.nlm.nih.gov/articles/PMC...
comment in response to
post
Also there's an interesting package called Ckmeans.1d.dp that calculates the exactly optimal k-means clustering in 1D.
cran.r-project.org/web/packages...
comment in response to
post
The optimum k-means clustering has nice properties for quantizing a large cloud of points, with coverage of lower density regions. However getting near the optimum is very dependent on the initialization. There's a variant called k-means++ with better initialization.
logarithmic.net/pfh-files/bl...
comment in response to
post
Some reflections on the series here.
logarithmic.net/pfh/blog/017...
comment in response to
post
10/ TL;DR:
✔️ Under the null, p-values are uniform (flat distribution).
✔️ A "spike near 0" suggests true effects.
✔️ A "U-shape" signals artifacts—time to troubleshoot.
Plot your p-values. They’ll tell you more than you think!
Have questions? Let’s chat 👇
comment in response to
post
This is mostly working with one particular lab. At least one person has reproduced it with their own R code. They actually added the red and blue coloring, which I thought was redundant but does add a lot of visual impact. I am going to need to add it to the new version.
comment in response to
post
The lesson I've learned is to be very careful writing code that produces plots.
If there's an easy function, or some code to copy from a vignette, people tend to use it without much thought.
comment in response to
post
Here's a new version of the plot. There is one point per gene! The y axis shows the estimated log fold change, and the color tells about the confidence bound. I lose a little resolution by using color, but hopefully gain understandability. I am hoping it is less confusing and more conventional.
comment in response to
post
Here's the plot. It's looking at differential gene expression. There are two type of points. They gray dots show estimated log fold change on the y-axis. The colored points show a confidence bound on the log fold change on the y-axis. A significant gene is represented by two different points!
comment in response to
post
This was one of the important ideas:
comment in response to
post
Great 'cos I just started this one. go.bsky.app/NmhwfbN
comment in response to
post
So I'm thinking it might be possible to take a policy like BH which is maybe not quite right in a particular setup and *recalibrate* it based on simulation or resampling. For example a Tukey all-pairs comparison version of FDR, where the test statistics aren't independent. Or gene-set enrichment.
comment in response to
post
Prompted by a case where the slope really should have been 1. geom_smooth made it look less, and even made it seem like the data should be broken into groups.
⟋
⟋
⟋
comment in response to
post
The link in 5 is broken for me.
I think it's changed to
academic.oup.com/clinchem/art...
comment in response to
post
Adding some speculation: Roughly, GLMs are unbiassed on a linear scale even when the model is on a different scale such as log. This perhaps isn't ideal. Maybe it is better to be unbiassed on a log(ish) scale, like the voom-limma method. This somewhat protects you from large positive outliers.
comment in response to
post
Um. Does the Wilcoxon rank-sum test not handle ties well? I thought there was a correction for ties.
Context: We use this test a lot with scRNA-Seq data, which has a lot of zeros.
comment in response to
post
Congratulations to everyone involved -- Adele Barugahare, Nitika Kandhari, Scott Coutts, Andrew Perry, and especially Laura Perlaza-Jimenez for actually making it happen.
monashbioinformaticsplatform.github.io/RNAseq_works...
comment in response to
post
This had been an idea for quite a while, and I'm really pleased how it turned out. When we do command-line or R workshops, we spend a lot of time on the how. With web tools we could really dig into the meaning behind the plots. Scott was also able to give detailed experience on library preparation.