Looks interesting and relevant to the issue of LLMs interpreting science papers given p-values.
This is an open ended question and the vast majority of the scientific literature doesn't interpret them correctly. Just think of all the 'trends toward significance' phrases out there.
This is an open ended question and the vast majority of the scientific literature doesn't interpret them correctly. Just think of all the 'trends toward significance' phrases out there.
Reposted from
Valentin Amrhein
"We find that ChatGPT, Gemini, and Claude fall prey to dichotomania at the 0.05 and 0.10 thresholds commonly used to declare ‘statistical significance’."
doi.org/10.1017/jdm....
doi.org/10.1017/jdm....
Comments