siree.sh - Profile | ThreadSky | a Reddit-style client for Bluesky

siree.sh

PhD student @ltiatcmu.bsky.social. Working on NLP that centers worker agency. Otherwise: coffee, fly fishing, and keeping peach pits around, for...some reason https://siree.sh

120 posts 2,372 followers 2,745 following

Posts 11 Comments 39

Gently, I would like to say: When people tell you that they would appreciate a feature that does something automatically, it's not responsive to that concern to explain that by going through several steps for every individual instance, they can get the same result in each instance.

submitted 10 days ago • 3 comments

When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs: 🧵1/9

submitted 13 days ago • 1 comment

I for one am grateful for the opportunity to meditate on the meaning of “scientific artifact” at 2:15am

submitted 33 days ago • 1 comment

Great points in the replies here re:civility being an unequally applied, generally awful standard. It's also a great example of how common formulations of "sentiment analysis" or "toxicity detection" in NLP that are non-contextual lead to systems that stop sounding good with even slight scrutiny.

submitted 50 days ago • 0 comments

AI as another pivot to video is a thought I've also had, and this article is such a great articulation. The way "AI" is framed, pitched, and deployed by big players reflects at best ignorance, and at worst a real contempt for the social nature and humanity of our jobs and online spaces

submitted 52 days ago • 1 comment

This talk was such a joy to do! If you'd like to read the paper, it's here: arxiv.org/abs/2411.17840. Thank you for having us, @patrickbriansmith.bsky.social!

submitted 52 days ago • 1 comment

If you're at NAACL this week (or just want to keep track), I have a feed for you: bsky.app/profile/did:... Currently pulling everyone that mentions NAACL, posts a link from the ACL Anthology, or has NAACL in their username. Happy conferencing!

submitted 54 days ago • 1 comment

Ever trusted a metric that works great on average, only for it to fail in your specific use case? In our #NAACL2025 paper (w/ @841io.bsky.social), we show why global evaluations are not enough and why context matters more than you think. 📄 aclanthology.org/2025.finding... #NLP #Evaluation (🧵1/9)

submitted 54 days ago • 1 comment

🚀 Excited to share a new interp+agents paper: 🐭🐱 MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools appearing at #NAACL2025 This was work done @msftresearch.bsky.social last summer with Jason Eisner, Justin Svegliato, Ben Van Durme, Yu Su, and Sam Thomson 1/🧵

submitted 54 days ago • 1 comment

This is absurdly great, but I haven't read a single news article about it. A fully open source, offline-first alternative to Notion that's a collab between the French and German governments because they want to host docs securely and on their own terms. THIS is what Europe should be doing.

submitted 98 days ago • 27 comments

one thing in AI is not new -- people taking one small part of a job, mischaracterizing it, ignoring all the other stuff, and then assume the AI can do the whole job on its own

submitted 125 days ago • 2 comments