The way that psychologists talk about different “types” of validity and reliability — is that another variation of not clearly distinguishing between the theoretical estimand and the statistical estimate? Like there’s one reliability but different ways to infer it based on different assumptions…
Comments
validity estimand is like bias: E[measurement - latent]
reliability estimand is like variance: V[measurement]
tough q is what the E and V are taken with respect to! it's not over samples or assignments -- is it imagined error term?
interesting to think about "types" as estimators
I never quite liked all the versions of "validity" as distinct criteria. I always assumed someone thought about them longer than I have, so there must be a reason.
https://pure.manchester.ac.uk/ws/portalfiles/portal/51526413/24._Hughes_Psychometric_Validity._Establishing_the_Accuracy_and_Appropriateness_of_Psychometric_Measures.pdf
I review the history of validity theory and present a 2-stage process (along with evidence-types) to ensure psychometric Accuracy and Appropriateness
https://bsky.app/profile/mattansb.bsky.social/post/3lltz5edrsc2s
As for types of (construct) validity- afaik those are passé in psychometric circles, but have lingered on in psych-undergrad programs for some reason (maybe they'll update the syllabus in 20 years).
1/n
2/n
3/n
4/n
n/n
Denny Borsboom tried to make an argument like that, right? That validity is just "Something is valid if it exists and causes variation in the measurement instrument". And then by extension all the "types" of validity are just ways of describing the elephant.
Goodwin, L. D., & Leech, N. L. (2003). The meaning of validity in the new standards for educational and psychological testing: Implications for measurement courses.
My favorite example are the various reliability metric for multilevel/longitudinal data. I measure X of person S at time T, is it a good measure? Sometimes it's a very good measure of person S, but not at time T specifically, etc..