pausalz.bsky.social
Paul Zivich, Assistant (to the Regional) Professor
Computational epidemiologist, causal inference researcher, amateur mycologist, and open-source enthusiast.
https://github.com/pzivich
#epidemiology #statistics #python #episky #causalsky
411 posts
1,968 followers
511 following
Getting Started
Active Commenter
comment in response to
post
I'm not, but you're right that I should organize them better. I should have learned my lesson about that from reading and posting Robins 86 on twitter lol
comment in response to
post
The larger ideas behind this approach (and an example of its application) are also provided in this paper
pmc.ncbi.nlm.nih.gov/articles/PMC...
comment in response to
post
Our full stack of estimating equations is the following. The first 2 are for the sens/spec of Y* for Y, the next 2 are for sens/spec of Y** for Y*. The 5th is the crude mean of Y**. Then the 6th and 7th are the iterative application of Rogan-Gladen estimator (double Rogan-Gladen)
comment in response to
post
Here, one validation data set has details on Y** and Y* (so partially corrects measurement error) and the other has Y* and Y (fully corrects). Here, we don't know how Y** and Y are related. So, instead we are going to iteratively correct for our measurement error
comment in response to
post
So, think about the case where we have external data but it still isn't quite the gold-standard. Instead, we have a chain of data where we have two mis-measured versions of the outcome (Y**, Y*) and a gold-standard (Y)
comment in response to
post
If you're like "sure, we've had one, but what about a second Rogan-Gladen?" I'm happy to let you know that is considered here, where it is applied iteratively (maybe will come back to this one on another monday)
pubmed.ncbi.nlm.nih.gov/39198907/
comment in response to
post
A code (in Python) is provided here
deli.readthedocs.io/en/latest/Ex...
comment in response to
post
The estimating functions for this example have been described in detail elsewhere. Here are a few, with accompanying code in multiple languages
academic.oup.com/aje/article/...
pubmed.ncbi.nlm.nih.gov/38423105/
comment in response to
post
This estimating function is a little special, in that it actually does not directly depend on the data itself. As I've said before, we will return to the connection between the sandwich variance estimator and the delta method
comment in response to
post
Finally, we can translate the Rogan-Gladen estimator into an estimating function using a little algebra (left as an exercise for the reader, to be mildly annoying)
comment in response to
post
To this, we can add the estimating function for the mean of the mismeasured variable in our main data. Below is this conditional probability added to the column vector
comment in response to
post
Sensitivity and specificity are then simply conditional means (or proportions). Here R=0 indicates the external observations we are using to estimate sensitivity and specificity
comment in response to
post
As should be the obvious theme by now, we can use M-estimators and the sandwich variance estimator to incorporate the variability from all the sources
So let's get out stack of estimating equations. For simplicity we will assume that our sensitivity and specificity are transportable across contexts
comment in response to
post
The previous expression indicates that we know sensitivity and specificity perfectly, but this is almost never the case. Instead, we might have some external data available
So, we instead *estimate* sensitivity and specificity. As a result, our variance estimator should include that variability
comment in response to
post
An easy to apply method is the Rogan-Gladen Estimator (no relation to Joe Rogan). The estimator is given by the following, where Y is the gold-standard and Y* is the mismeasured
comment in response to
post
As a primarily non-R person, not having `return` makes R code harder to read (I want clarity in what a function is returning!!)
comment in response to
post
My thoughts aren't fully formed but it's something I have been thinking about. I wonder whether everything has to be 'exactly correct'. It seems like 'close enough' projections get us 'close' to the function? At least for the decimal places we report. So does it matter if it isn't exact?
comment in response to
post
I don't disagree with you. My statement was following from use of parametric models for the nuisance functions (which remains common in epi). Yes, DR offer a big advantage in more flexible modeling and their use has increasingly picked up for that feature
comment in response to
post
But here the sandwich variance estimator IS doubly robust. So we can get both doubly robust point AND variance estimates
You can read more details in the following pre-print
arxiv.org/abs/2404.16166
comment in response to
post
The IF variance assumes that both models are correct. So our point estimator is doubly robust but the variance estimator is NOT. To me, this diminishes the motivation behind using doubly robust estimators...
comment in response to
post
Again we can use the sandwich for variance estimation
Those of you familiar with AIPW or doubly robust methods might know that we can also use a influence function (IF) variance estimator. So you might wonder why bother with the sandwich here
comment in response to
post
Step 4 is again just predict some values from the model (doesn't require an estimating equation), so we only need Step 5. Step 5 is simply the mean of the predicted values from the outcome model (like for g-computation usually). The following is the full stack
comment in response to
post
Step 2, is simply computing the weights (which doesn't need it's own estimating equation). Step 3 is a weighted regression model. While we haven't done weighted regression yet, it is simply the product of the weight with the estimating function. Below presents these stacked together
comment in response to
post
So now let's translate this procedure into estimating equations. First is our logistic regression model for R|X. This is simply the same as the previous post
comment in response to
post
You can read a bit more about this weighted regression version of AIPW (in the context of confounding) in the following short commentary
I like this approach since it's super easier to implement and you don't need to memorize the AIPW formula
academic.oup.com/aje/article/...
comment in response to
post
To put these models together, I am going to do a simple trick based on weighted regression. Our process is:
(1) Fit a model for R given X
(2) Compute IPW
(3) Estimate a model for Y given X among R, weighted by IPW
(4) Predict Y for all observations
(5) Take mean of Y
comment in response to
post
In the previous posts, we had two nuisance models: a model for the missingness process and a model for the outcome process. Here, we are going to put them together. The advantage of this approach is that as long as 1 model is correct, we will be unbiased
comment in response to
post
Unlike the usual 'robust' or 'GEE' trick used with IPW estimator, this variance estimator is not conservative. We will return to this in a future week
comment in response to
post
To complete our estimator, we can simply stack together the logistic regression estimating equations with our chosen weighted mean estimator. Again the sandwich gives us the variance directly
comment in response to
post
One item you might recall is that the Hajek is bounded in the parameter space, while the Horvitz-Thompson is not. Here, you can see this feature in the estimating equations. This results from where the difference between Y and \mu is computed
comment in response to
post
For the Hajek, we have a similar process of deriving the estimating equation
comment in response to
post
For the Horvitz-Thompson, we can get the corresponding estimating equation after doing a little math (left as an exercise for the reader)
comment in response to
post
To build our M-estimator, we first need a logistic regression model. Luckily for us, the logistic model has a simple score function (which gives us the estimating function directly), which is shown below
comment in response to
post
An IPW estimator may be implemented as:
(1) Estimate a logistic model for R given X
(2) Compute the IPW for missingness
(3) Take the Horvitz-Thompson or Hajek weighted mean
comment in response to
post
Why? Well it is because the sandwich variance 'automates' the delta method (we will return to why this is in a future week). This means we can estimate the variance of \mu while incorporating the uncertainty in estimating \beta. We don't have to bother with the bootstrap!!
comment in response to
post
Given that these are equivalent ways of programming the same estimator, why bother with the M-estimator?
The reason is that the M-estimator also gives us a way to estimate the variance directly. We can simply use the sandwich variance
comment in response to
post
Now that we have each of the pieces, we can simply stack the estimating functions into a column vector. This gives us our entire estimator, where all the pieces are estimated simultaneously
comment in response to
post
Step 2-3 can be done in the same estimating equation. We simply want the predicted values of Y and then the estimating equation for the mean. The following shows this notationally
comment in response to
post
That R at the start effectively allows only rows where Y is measured to have non-zero contributions to the estimating equations.
This covers Step 1
comment in response to
post
As we learned last time, a regression model for Y given X is pretty simple. However, we need to limit contributions to those units with a measured Y. If we let R=1 indicate that Y is observed, the corresponding estimating equation is
comment in response to
post
That last step gives us the estimate of the mean of Y that accounts for missingness driven by the covariate set.
To turn this 3 step process into an M-estimator, we need to translate each step into estimating equations. Then we can simply stack them together
comment in response to
post
So, here is our procedure:
(1) Estimate a model for the variable with missing data (e.g., Y) given observed covariates we think are needed for exchangeability
(2) Using the estimated model, generate predicted values of the outcome for all observations
(3) Take the mean of those predicted values