pausalz.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

I'm not, but you're right that I should organize them better. I should have learned my lesson about that from reading and posting Robins 86 on twitter lol

submitted 2 days ago

comment in response to post

The larger ideas behind this approach (and an example of its application) are also provided in this paper pmc.ncbi.nlm.nih.gov/articles/PMC...

submitted 2 days ago

comment in response to post

Our full stack of estimating equations is the following. The first 2 are for the sens/spec of Y* for Y, the next 2 are for sens/spec of Y** for Y*. The 5th is the crude mean of Y**. Then the 6th and 7th are the iterative application of Rogan-Gladen estimator (double Rogan-Gladen)

submitted 2 days ago

comment in response to post

Here, one validation data set has details on Y** and Y* (so partially corrects measurement error) and the other has Y* and Y (fully corrects). Here, we don't know how Y** and Y are related. So, instead we are going to iteratively correct for our measurement error

submitted 2 days ago

comment in response to post

So, think about the case where we have external data but it still isn't quite the gold-standard. Instead, we have a chain of data where we have two mis-measured versions of the outcome (Y**, Y*) and a gold-standard (Y)

submitted 2 days ago

comment in response to post

If you're like "sure, we've had one, but what about a second Rogan-Gladen?" I'm happy to let you know that is considered here, where it is applied iteratively (maybe will come back to this one on another monday) pubmed.ncbi.nlm.nih.gov/39198907/

submitted 9 days ago

comment in response to post

A code (in Python) is provided here deli.readthedocs.io/en/latest/Ex...

submitted 9 days ago

comment in response to post

The estimating functions for this example have been described in detail elsewhere. Here are a few, with accompanying code in multiple languages academic.oup.com/aje/article/... pubmed.ncbi.nlm.nih.gov/38423105/

submitted 9 days ago

comment in response to post

This estimating function is a little special, in that it actually does not directly depend on the data itself. As I've said before, we will return to the connection between the sandwich variance estimator and the delta method

submitted 9 days ago

comment in response to post

Finally, we can translate the Rogan-Gladen estimator into an estimating function using a little algebra (left as an exercise for the reader, to be mildly annoying)

submitted 9 days ago

comment in response to post

To this, we can add the estimating function for the mean of the mismeasured variable in our main data. Below is this conditional probability added to the column vector

submitted 9 days ago

comment in response to post

Sensitivity and specificity are then simply conditional means (or proportions). Here R=0 indicates the external observations we are using to estimate sensitivity and specificity

submitted 9 days ago

comment in response to post

As should be the obvious theme by now, we can use M-estimators and the sandwich variance estimator to incorporate the variability from all the sources So let's get out stack of estimating equations. For simplicity we will assume that our sensitivity and specificity are transportable across contexts

submitted 9 days ago

comment in response to post

The previous expression indicates that we know sensitivity and specificity perfectly, but this is almost never the case. Instead, we might have some external data available So, we instead *estimate* sensitivity and specificity. As a result, our variance estimator should include that variability

submitted 9 days ago

comment in response to post

An easy to apply method is the Rogan-Gladen Estimator (no relation to Joe Rogan). The estimator is given by the following, where Y is the gold-standard and Y* is the mismeasured

submitted 9 days ago

comment in response to post

As a primarily non-R person, not having `return` makes R code harder to read (I want clarity in what a function is returning!!)

submitted 14 days ago

comment in response to post

My thoughts aren't fully formed but it's something I have been thinking about. I wonder whether everything has to be 'exactly correct'. It seems like 'close enough' projections get us 'close' to the function? At least for the decimal places we report. So does it matter if it isn't exact?

submitted 15 days ago

comment in response to post

I don't disagree with you. My statement was following from use of parametric models for the nuisance functions (which remains common in epi). Yes, DR offer a big advantage in more flexible modeling and their use has increasingly picked up for that feature

submitted 15 days ago

comment in response to post

But here the sandwich variance estimator IS doubly robust. So we can get both doubly robust point AND variance estimates You can read more details in the following pre-print arxiv.org/abs/2404.16166

submitted 15 days ago

comment in response to post

The IF variance assumes that both models are correct. So our point estimator is doubly robust but the variance estimator is NOT. To me, this diminishes the motivation behind using doubly robust estimators...

submitted 15 days ago

comment in response to post

Again we can use the sandwich for variance estimation Those of you familiar with AIPW or doubly robust methods might know that we can also use a influence function (IF) variance estimator. So you might wonder why bother with the sandwich here

submitted 15 days ago

comment in response to post

Step 4 is again just predict some values from the model (doesn't require an estimating equation), so we only need Step 5. Step 5 is simply the mean of the predicted values from the outcome model (like for g-computation usually). The following is the full stack

submitted 15 days ago

comment in response to post

Step 2, is simply computing the weights (which doesn't need it's own estimating equation). Step 3 is a weighted regression model. While we haven't done weighted regression yet, it is simply the product of the weight with the estimating function. Below presents these stacked together

submitted 15 days ago

comment in response to post

So now let's translate this procedure into estimating equations. First is our logistic regression model for R|X. This is simply the same as the previous post

submitted 15 days ago

comment in response to post

You can read a bit more about this weighted regression version of AIPW (in the context of confounding) in the following short commentary I like this approach since it's super easier to implement and you don't need to memorize the AIPW formula academic.oup.com/aje/article/...

submitted 15 days ago

comment in response to post

To put these models together, I am going to do a simple trick based on weighted regression. Our process is: (1) Fit a model for R given X (2) Compute IPW (3) Estimate a model for Y given X among R, weighted by IPW (4) Predict Y for all observations (5) Take mean of Y

submitted 15 days ago

comment in response to post

In the previous posts, we had two nuisance models: a model for the missingness process and a model for the outcome process. Here, we are going to put them together. The advantage of this approach is that as long as 1 model is correct, we will be unbiased

submitted 15 days ago

comment in response to post

Unlike the usual 'robust' or 'GEE' trick used with IPW estimator, this variance estimator is not conservative. We will return to this in a future week

submitted 15 days ago

comment in response to post

To complete our estimator, we can simply stack together the logistic regression estimating equations with our chosen weighted mean estimator. Again the sandwich gives us the variance directly

submitted 15 days ago

comment in response to post

One item you might recall is that the Hajek is bounded in the parameter space, while the Horvitz-Thompson is not. Here, you can see this feature in the estimating equations. This results from where the difference between Y and \mu is computed

submitted 15 days ago

comment in response to post

For the Hajek, we have a similar process of deriving the estimating equation

submitted 15 days ago

comment in response to post

For the Horvitz-Thompson, we can get the corresponding estimating equation after doing a little math (left as an exercise for the reader)

submitted 15 days ago

comment in response to post

To build our M-estimator, we first need a logistic regression model. Luckily for us, the logistic model has a simple score function (which gives us the estimating function directly), which is shown below

submitted 15 days ago

comment in response to post

An IPW estimator may be implemented as: (1) Estimate a logistic model for R given X (2) Compute the IPW for missingness (3) Take the Horvitz-Thompson or Hajek weighted mean

submitted 15 days ago

comment in response to post

Why? Well it is because the sandwich variance 'automates' the delta method (we will return to why this is in a future week). This means we can estimate the variance of \mu while incorporating the uncertainty in estimating \beta. We don't have to bother with the bootstrap!!

submitted 16 days ago

comment in response to post

Given that these are equivalent ways of programming the same estimator, why bother with the M-estimator? The reason is that the M-estimator also gives us a way to estimate the variance directly. We can simply use the sandwich variance

submitted 16 days ago

comment in response to post

Now that we have each of the pieces, we can simply stack the estimating functions into a column vector. This gives us our entire estimator, where all the pieces are estimated simultaneously

submitted 16 days ago

comment in response to post

Step 2-3 can be done in the same estimating equation. We simply want the predicted values of Y and then the estimating equation for the mean. The following shows this notationally

submitted 16 days ago

comment in response to post

That R at the start effectively allows only rows where Y is measured to have non-zero contributions to the estimating equations. This covers Step 1

submitted 16 days ago

comment in response to post

As we learned last time, a regression model for Y given X is pretty simple. However, we need to limit contributions to those units with a measured Y. If we let R=1 indicate that Y is observed, the corresponding estimating equation is

submitted 16 days ago

comment in response to post

That last step gives us the estimate of the mean of Y that accounts for missingness driven by the covariate set. To turn this 3 step process into an M-estimator, we need to translate each step into estimating equations. Then we can simply stack them together

submitted 16 days ago

comment in response to post

So, here is our procedure: (1) Estimate a model for the variable with missing data (e.g., Y) given observed covariates we think are needed for exchangeability (2) Using the estimated model, generate predicted values of the outcome for all observations (3) Take the mean of those predicted values

submitted 16 days ago