The way that psychologists talk about different “types” of validity and reliability — is that another variation of not clearly distinguishing between the theoretical estimand and the statistical estimate? Like there’s one reliability but different ways to infer it based on different assumptions… - ThreadSky

dingdingpeng.the100.ci • 81 days ago

The way that psychologists talk about different “types” of validity and reliability — is that another variation of not clearly distinguishing between the theoretical estimand and the statistical estimate? Like there’s one reliability but different ways to infer it based on different assumptions…

Comments

aecoppock.bsky.social•81 days ago

do you think:
validity estimand is like bias: E[measurement - latent]
reliability estimand is like variance: V[measurement]

tough q is what the E and V are taken with respect to! it's not over samples or assignments -- is it imagined error term?

interesting to think about "types" as estimators

hbru.bsky.social•81 days ago

I hope this thread brings clarity. I am teaching this stuff in 3 weeks.

I never quite liked all the versions of "validity" as distinct criteria. I always assumed someone thought about them longer than I have, so there must be a reason.

dingdingpeng.the100.ci•81 days ago

I’m currently teaching the Borsboom version (bit biased because of course I’ll go with the causal take) and then frame the “types” of validity as different ways to collect evidence for such validity, but not sure that’s fully consistent and the most productive solution.

hbru.bsky.social•81 days ago

I took a look at the paper and I like it. But under this view ideas like "discriminant validity" do not make sense. But probably they did not make sense in the first place.

dingdingpeng.the100.ci•81 days ago

I do think you’d want that other constructs don’t have strong causal effects on your measure, so that’s how I’m currently framing it.

david-j-hughes.bsky.social•80 days ago

A chapter I wrote a while back might be useful. Has been for colleagues and a good overview for students:
https://pure.manchester.ac.uk/ws/portalfiles/portal/51526413/24._Hughes_Psychometric_Validity._Establishing_the_Accuracy_and_Appropriateness_of_Psychometric_Measures.pdf

I review the history of validity theory and present a 2-stage process (along with evidence-types) to ensure psychometric Accuracy and Appropriateness

felipefv.bsky.social•81 days ago

Chapter 2 here might be nice for students to get a summary of the mess: https://www.repository.cam.ac.uk/items/895d37e1-5dd4-4f39-867f-d24eca7a136a

gidon-frischkorn.bsky.social•81 days ago

With respect to reliability, I would say yes, probably more precisely there is one reliability (the ratio of true score variance to total variance) and different sets of assumptions (including generalizability theory) that make certain statistical estimates proper estimators for it.

gidon-frischkorn.bsky.social•81 days ago

In case of validity, I am very sympathetic to Boorsboom’s definition. And based on this perspective there is a more complicated issue with respect to quantifying validity. Following the Borsboom definition, validity is the fit of a theory to a measurement of the processes proposed by the theory.

gidon-frischkorn.bsky.social•81 days ago

This would require formal models of the processes supposed to cause variation in a measurement and how they translate into observed behavior. And I would say this needs to be something more than an IRT model (which is more of a statistical model from my perspective)

gidon-frischkorn.bsky.social•81 days ago

And even if there is a formal model, it will likely contain several parameter. Thus, any indicator will capture variance from multiple processes contributing to variation in the indicator.

gidon-frischkorn.bsky.social•81 days ago

I am currently working together with @koberauer.bsky.social on two projects exploring how investigating validity would work this way.

dingdingpeng.the100.ci•81 days ago

Uuuuh, I’m intrigued! Looking forward to learn what comes out of that 👀

boryslaw.bsky.social•81 days ago

I really like what you are saying in your series of replies, but I don't get the part about quantifying *Borsboom's* validity. According to this definition, it is a binary property - a method is either valid or not; there is no in-between. What is quantifiable is evidence of validity, right?

gidon-frischkorn.bsky.social•81 days ago

From my perspective there are two sides to this coin: 1) does the formal model fit the observed data. If yes, the model is valid. But then you can ask 2) variation in which parameters of the model is causally responsible for variation in the observed data. This is a matter of degree and not binary.

boryslaw.bsky.social•81 days ago

I see. That is a very different notion though, since Borsboom's validity is a property of an actual measurement process; in particular, it is not model-relative. A false or uninterpretable model may fit the data, a perfectly valid (in this sense) measurement method may be based on a false model, etc

mattansb.bsky.social•81 days ago

IMO the different types of reliability do map onto different estimands, eg:
https://bsky.app/profile/mattansb.bsky.social/post/3lltz5edrsc2s

As for types of (construct) validity- afaik those are passé in psychometric circles, but have lingered on in psych-undergrad programs for some reason (maybe they'll update the syllabus in 20 years).

edkroc.bsky.social•81 days ago

IMO, it's more fundamental than that. Psychologists refuse to define coherent measurands (sometimes for good-ish reasons). Estimates (for any estimand) presuppose coherent measurements for a target measurand. The two should not be conflated.

drewhalbailey.bsky.social•81 days ago

I think that's the way some people talk about types of validity and reliability. But I view (threats to) validity typologies as compatible with estimands: threats to validity are ways that mapping between estimates and estimands can go wrong!

dingdingpeng.the100.ci•81 days ago

Aaaah but then you’re probably thinking in terms of internal validity, external validity, construct validity…? I should have been more precise; I was thinking about validity in the measurement context (construct validity, face validity, criterion validity…)

drewhalbailey.bsky.social•81 days ago

Ah got it, thanks. In this case, I guess I agree the link between theory and these statistics is often squishy!

dingdingpeng.the100.ci•81 days ago

@conjugateprior.org @boryslaw.bsky.social i think this is the type of question for the type of people you are

conjugateprior.org•81 days ago

No idea about psychologists but fwiw (and this is not going to help anyone even slightly) I think of types of validity as a way to make the case for one's measure under the Ramsey view of scientific theories. "Concept X has these parts, scope & theoretical relations. And look, my measure does too".

boryslaw.bsky.social•81 days ago

If by "variation" you mean a special case, then I guess sometimes yes, sometimes no, right? The term validity was explicitly defined in a bunch of ways by a bunch of authors, and not all of these definitions can be encoded in the language of causality.
1/n

boryslaw.bsky.social•81 days ago

My favorite def. of validity (Borsboom's) is causal, but most researchers do not seem to know it. That most don't is only my guess, though. Or consider this characterization of reliability from Wikipedia:
2/n

boryslaw.bsky.social•81 days ago

"It is the characteristic of a set of test scores that relates to the amount of random error from the measurement process that might be embedded in the scores." It is impossible to tell random from systematic variation in measurements without introducing causal notions ...
3/n

boryslaw.bsky.social•81 days ago

... (or talking about the measurement *process*), and this seems to be a characterization of a single concept, right? What I find most interesting about your question is the hinting at a more general, abstract view :-)
4/n

boryslaw.bsky.social•81 days ago

On that note, isn't this essentially just another example of psychologists confusing methods with the goals these methods are supposed to serve? And often having no clue what the goals are. I.e., cargo cult :-)
n/n

kevinmking.bsky.social•81 days ago

I've always struggled with that.

Denny Borsboom tried to make an argument like that, right? That validity is just "Something is valid if it exists and causes variation in the measurement instrument". And then by extension all the "types" of validity are just ways of describing the elephant.

dingdingpeng.the100.ci•81 days ago

Yeah that’s what I’m thinking of. But also for reliability that’s how I teach it already (definition, hypothetical strictly parallel measurements, different approximations)

nilspetras.bsky.social•81 days ago

I would rather argue that validity is a characteristic of claims, not things. A measure cannot be valid. A claim derived from the scores of a measure can be valid. For that, you also need arguments (express the claim and its relationship to your statistical test), not only estimates based on scores.

eikofried.bsky.social•81 days ago

That's why I've liked the STANDARDS 2014 once I discovered them!

nilspetras.bsky.social•81 days ago

I regularly preach this to my students for this exact reason! https://doi.org/10.1080/07481756.2003.11909741

Goodwin, L. D., & Leech, N. L. (2003). The meaning of validity in the new standards for educational and psychological testing: Implications for measurement courses.

rwidome.bsky.social•81 days ago

I think so. And I think the issue is that in epi, with validity we usually mean criterion validity when we use the word validity, so we don't think as much about there being types. Because we call the other things by other names.

sanjaysrivastava.com•81 days ago

I view Cronbach’s G theory as trying to challenge the idea of “one estimand” for reliability. Reliability is consistency in scores. Ok fine, consistency over what? Raters? Items? Time? This aren’t different estimates of the same thing; they’re different things

mattansb.bsky.social•81 days ago

And vice versa - you can have one estimate for different estimands.

My favorite example are the various reliability metric for multilevel/longitudinal data. I measure X of person S at time T, is it a good measure? Sometimes it's a very good measure of person S, but not at time T specifically, etc..

kevinmking.bsky.social•81 days ago

I think the notion of the estimand is the same though. Right? It's not one thing; every statistical estimand is a specific expression of a hypothesis test or inference. It's all in a specific context.

boryslaw.bsky.social•81 days ago

Doesn't reliability defined as consistency in scores imply that every constant function is a perfectly reliable method of measuring every measurement target? :-)

sanjaysrivastava.com•81 days ago

Yup. Reliable, not valid

boryslaw.bsky.social•81 days ago

I cannot tell if you are joking or not!

sanjaysrivastava.com•81 days ago

https://bsky.app/profile/sanjaysrivastava.com/post/3ljbysektcs2u

tailcalled.bsky.social•81 days ago

I think yes but also I think there are multiple types of validity. The "life expectancy is correlated with GDP per capita so therefore life expectancy is the same as GDP per capita" fallacy illustrates one sense of this.

tailcalled.bsky.social•81 days ago

There are times (e.g. when you are an actuary working for an insurance company) where you want to have a high correlation with what you are trying to measure, but I feel like in science you typically want a "proper measurement".

tailcalled.bsky.social•81 days ago

(I'm not actually sure you can define "proper measurement" in a daggy way, at least not without going all the way to Pearl's level 3? Because you want to be able to say something like "the measurement tool works the same way for each individual".)

Comments

Posting Rules

Reply