Pleased to share the latest version of my paper with Arthur Spirling and @lexipalmer.bsky.social on replication using LMs We show: 1. current applications of LMs in political science research *don't* meet basic standards of reproducibility... - ThreadSky

cbarrie.bsky.social • 128 days ago

Pleased to share the latest version of my paper with Arthur Spirling and @lexipalmer.bsky.social on replication using LMs

We show:

1. current applications of LMs in political science research *don't* meet basic standards of reproducibility...

1 / 3

Comments

dandekadt.bsky.social•128 days ago

Sorry I'm not going to cite this until Arthur joins bsky

cbarrie.bsky.social•128 days ago

I'm currently sending screenshots to him of responses to the paper on WhatsApp. This isn't sustainable

joshmccrain.bsky.social•127 days ago

tell him i've been making fun of him a lot on here and he's missing it

mashakr.bsky.social•127 days ago

Seconded

librarykilleen.bsky.social•128 days ago

📌

gabrielrega.bsky.social•61 days ago

📌

wanlo.bsky.social•127 days ago

Great to see such strong arguments for using "open-weight" LLMs! Maybe setting random seeds could be added to the advice to practitioners? Most interfaces seem to support this now—huggingface, OpenAI, Ollama, vllm,…

retropz.bsky.social•128 days ago

📌

jbgruber.bsky.social•127 days ago

"We used the version accessible through huggingface which has additional functionality over the base model, including integration with vllm which allows for faster responses. This was useful given how difficult it is to run a local LM efficiently." what do you mean by difficult here?

cbarrie.bsky.social•127 days ago

I defer to @lexipalmer.bsky.social here but my understanding is we were struggling with speed of returns

matzefrey.bsky.social•128 days ago

📌

nadiah.bsky.social•128 days ago

Bookmarking to read later 📌

bhashmazumder.bsky.social•128 days ago

📌

foxnic.bsky.social•127 days ago

📌

altiam.bsky.social•128 days ago

Revealing insights! How do you see this evolving?

cbarrie.bsky.social•128 days ago

2. That there is meaningful (and often unacceptably large) variance between rounds of using LMs even on the same data with the same prompt...

tomasruiz.bsky.social•128 days ago

Do you guys find large variance using the same fixed model id and temperature=0? What percent of classifications flip? 🤔 I would hope this number is low...

brendannyhan.bsky.social•128 days ago

yes, same question re: testing with same model and temperature=0. we've found high correspondence

cbarrie.bsky.social•128 days ago

Good questions! We don't set model id and keep temp at default 1.

We're gonna set up a routine to test this.

I would say though that irrespective of model, we see drastically different downstream results *between* models + several ceased to exist (making model id moot)

cbarrie.bsky.social•128 days ago

3. That this has downstream consequences. Replicating recent published work with OpenAI LMs, we show that we would reach very different conclusions if relied on different LMs by the same company...

cbarrie.bsky.social•128 days ago

We finish with advice to practitioners as well as encouragement for the discipline to take seriously open-source and locally versionable LMs

mscharkow.bsky.social•128 days ago

Good and timely stuff! I wonder how we can navigate the (very real) tradeoff between quality and reproducibility. Will reviewer #2 accept a drop of .2 in accuracy between gpt/gemini vs. local models if you make this decision transparent? What if results change? Just add another robustness check?

cbarrie.bsky.social•128 days ago

You can access the latest version of the paper here: https://drive.google.com/file/d/1wNDIkMZfAGoh4Oaojrgll9SPg3eT-YXz/view

oddletters.bsky.social•128 days ago

📌

eam0.bsky.social•61 days ago

📌

mfagundes.com.br•118 days ago

📌

virtuistic.bsky.social•128 days ago

📌

purekimminess.bsky.social•127 days ago

When is Arthur going to get a blue sky account already? I miss his snarky methods posts!

juanjimenez.bsky.social•128 days ago

📌

ai-nikolai.bsky.social•93 days ago

Thank you for posting this work.

We are finding very similar findings for LLM Agent research.

Would anyone be interested in a collaboration on reproducibility on that?

cbarrie.bsky.social•93 days ago

Yes I'd be interested in what you're working on! Would you mind following up by email?

cbpuschmann.bsky.social•126 days ago

📌

sharonhoward.bsky.social•128 days ago

📌

schnizzl.bsky.social•127 days ago

📌

Comments

Posting Rules

Reply