Scientific Integrity Question
Consider a fictional AI system M, and a scientist S who wishes to use M in their research pipeline (e.g., for analysis and/or interpretation). Functionally, M is not fully understood by S. In your opinion, is it acceptable that S uses M in their research pipeline?
Consider a fictional AI system M, and a scientist S who wishes to use M in their research pipeline (e.g., for analysis and/or interpretation). Functionally, M is not fully understood by S. In your opinion, is it acceptable that S uses M in their research pipeline?
Comments
imo no. I just trust it
Should I know how ANOVA works before using it? Probably yes
Point being there are *degrees* of criticality of understanding the tools you use, depending on use
https://github.com/nosratullah/LCG-RandomGenerator
For most researchers in psychology or neuro, merely understanding that RNGs need new seeds is probably the extent that they need to understand them.
But I agree with your point. We can't be knowledgeable of everything. That's one of the reasons that we collaborate. To cover our blindspots!
I've found this relevant study regarding RNGs:
https://eprint.iacr.org/2024/578.pdf
you don't need to fully understand "people" to understand how responses are scored
https://bsky.app/profile/emp1.bsky.social/post/3lebu2koxac2h
https://bsky.app/profile/irisvanrooij.bsky.social/post/3leamakhegk2k
https://philsci-archive.pitt.edu/19384/
Conversely this is a considerably lower bar than complete understanding.
- stats (without getting mat mult, optimizers)
- software (w/o programming)
- screwdrivers (w/o smelting knowledge)
- RAs (can a human ever be fully understood?)
It’s neither a verifiable source nor able to do these tasks.
As a former fMRI person, I think this happens quite often.
As a person very invested in data provenance, I’d say it would be acceptable only if S involves some collaborators who understand M more fully.
Most fields use tools that are not fully understood.
For example, we don't "fully understand" what the Beck Depression Inventory is measuring, and yet it is a critical part of many research pipelines.
any real scientist is a practical person focused on getting results of hiQ
if he or she or it can get good, hiQ results faster with AI, then he or she or it will do that
just like normal science; you read a paper, or hear a talk and incorporate that into your work and if the paper or talk proves wrong that person goes on your personal shi*t list
happens all the time
Either A Kornberg or S Brenner:
anyone can publish what ever they want and science will sort it
PS: in my field, hard to find two people of greater stature then Kornberg and Brenner
I also answered here https://bsky.app/profile/irisvanrooij.bsky.social/post/3leboiwr5js27
https://bsky.app/profile/irisvanrooij.bsky.social/post/3lco46jmemc2y
https://link.springer.com/article/10.1007/s43681-022-00184-2
But I agree with you on the other dimensions being important, too.
Many scientists don’t really understand stats, for ex - e.g., they’ve been told they should use ANOVAs, so they do, but they don’t know why that vs. other methods.
I would say it depends. There is a class of problems that are hard to solve, but easy to check whether the solution is correct. I think those are the best use cases for AI as of now.
I don't fully understand Google's search algorithm, but surely nobody would object to me using it to find papers, because my uncertainty cannot trickle down to false results
On the other hand, if I ask a black box questions that go straight into the paper,…
Everything must be verified, driven by and passed through you, this maximises benefit.
That said, even something as simple as matrix multiply isn't fully understood once it hits a machine. And that's a good thing.
There are tools we (as a community) trust more, and tools we trust less, and those evaluations change over time.
I might be putting far more emphasis on "fully understand" than you intended. If so, feel free to disregard.
https://bsky.app/profile/glinden.bsky.social/post/3ldguapxh6c2p
(I think fake data is a violation of scientific integrity regardless of the answer)
For me I also have problems with scientists using and interpreting p-values wrongly and using regression when you don’t understand what you are doing…
Wrt fake data: using it might not result in harm but I feel it would be unreasonable to *expect that it wouldnt*.
Do all neuroscientists understand entirely how an MRI works?
IMHO, it is more a matter of epistemologically sound interpretation—understanding what can be concluded and what can't from the results (and that is/should be the job of people making those AI systems/tools).
https://journals.sagepub.com/doi/10.1177/20539517231155060
But I would claim that most scientists don't fully understand these parts of their current research pipeline:
Research assistants
Their computer
Statistical methods
Their measurement instruments
The emphasis was meant to be on *functionally* understood.
Also, the question is normative, not descriptive.
Hope this helps clarify
Can you say more about what "functional" means in this context?
I do not think “is” implies “ought”.
I also think it is worthwhile for scientists to critical reflect on our own standards. 1/
Und wie gestalten wir eine leicht verständliche Wissensbasis?
E.g. if M is a labeling tool, and they apply it reproducibly (eg versioned local copy), and appropriately validate its output (eg IRR with human experts + check for error biases), then I wouldn’t object.
I use AI for translation. Not for text analysis, but for reading reports and papers.
But i use plenty of other computer programmes too i don't understand either
Also, how much do they functionally understand it?
If they have a trustworthy model card / specification with well-defined confusion matrix, well-understood output distributions… that kind of thing—it’s probably OK
I think there are other reasons M may be unsuitable for their task, but this isn't a bar we hold other research tools to.
An AI has no ethical locus -- and (usually) has been built with stolen IP. Exactly contrary to a central goal of scientific authorship.
IMO the question is who takes responsibility for how and why they are making claims.
Realistically, usage of LLMs seems overwhelmingly likely to fail both parts of that test.
"Yes, it is very true, that. And it is just what some people will not do. They conceive a certain theory, and everything has to fit into that theory. If one little fact will ...
Another example is the use of the telescope for astronomical observations by Galileo. Is the Moon imperfect or the telescope?
As for “against method”, I’ll think amore about this. I think there is a difference between what Feyerabend argued for versus what is sold to science in AI hype … but maybe not.
But will think more.
In the example of linear regression, seems to me that linear regression is functionally well understood (i.e., its input-output mapping is well-defined). Causality is a different beast. It is a concept.
I didn't know that there was the possibility of negative weights in linear regression. So, not sure we can reduce it to a problem with the concept of causality.
The real reason I trust these tools is their consistency with other methods. I would say that AI results come into that category.
In doing testing of electrical equipment, being able to verify the results requires being able to verify all the equipment used in the testing.
I'd apply the same logic to software, so in order for S to verify the effect of system M in their research, they have to know how it works.
but some are ....
profitable
E.g. Eng may be given tools to scrape, store, analyze Internet at scale but no $ for licenses nor any mention, accountability, or even path to discuss such.
The outcome is to use what they are given, i.e., tools for theft.
Better thinking from the start can help prevent this waste imho
Fwiw, the OP was meant to be about present-day AI systems, I doubt they fit your scenario