Profile avatar
msonderegger.bsky.social
Linguistics, statistics, speech, cognitive science | McGill University Department of Linguistics
11 posts 219 followers 137 following
comment in response to post
I sympathize with the problem, though, and have seen people in this situation just exclude all words with frequency < some_constant, which seems worse.
comment in response to post
So there'd be a pseudo-word called "low_frequency_word"? I think I wouldn't do this, because it loses a lot of information... but I'm not sure what harm it'd do, if any. Probably affect the random-effect variance estimate and thus maybe SEs for word-level predictors? Curious what you find.
comment in response to post
could you add me? Thanks for doing this!
comment in response to post
not sure if you're doing Bayesian or frequentist models, but I've also used the neutralization dataset for different topics in Bayesian models here (feel free to take anything) people.linguistics.mcgill.ca/~morgan/ling... "smallest effect size of interest" connects nicely to ROPE
comment in response to post
I like your incomplete neutralization dataset! There are extensive exercises with it in my book. The 'english' and 'french_cdi' (word learning) datasets there, are also simple and intuitive, I've used them in data science courses.
comment in response to post
Ooh, agree. There's a section on this in my book -- random intercepts for "nuisance variables" with many levels. Shamelessly mentioning in case it helps more than 2 people read it...
comment in response to post
You can kind of do this using webR "Line-by-line Execution": quarto-webr.thecoatlessprofessor.com/qwebr-code-c... not exactly what you're saying, but close?
comment in response to post
Deadline extension! Abstracts for CorpusPhon are now due Wednesday, March 13 AoE. We are also excited to announce Dr. Michael McAuliffe @mmcauliffe.bsky.social as our invited speaker. Hope you can join us!