Profile avatar
dasmiq.bsky.social
Associate professor of computer science at Northeastern University. Natural language processing, digital humanities, OCR, computational bibliography, and computational social sciences. Artificial intelligence is an archival science.
323 posts 4,954 followers 285 following
Prolific Poster
Conversation Starter

More on LLM training dynamics from @jaydeepborkar.bsky.social: Even when personal information is sampled independently from the rest of the training data, interactions in term statistics can still increase leakage. arxiv.org/abs/2502.15680

Topical advice from 1861: If your friends have been unjustly fired by the federal government, be like Jessie B. Frémont. Write a BOOK that you sell to people to give as a CHRISTMAS PRESENT!

arxiv.org/abs/2502.19190 we're still tinkering, but couldn't wait to share.

Topical advice from 1861: If your friends have been unjustly fired by the federal government, be like Jessie B. Frémont. Write a BOOK that you sell to people to give as a CHRISTMAS PRESENT!

“The US of AI,” public draft of a talk given yesterday at Princeton. drive.google.com/file/d/1O2qk...

My PhD student Akhila's been doing some incredible cultural work in the last few years! Check out out latest work on cultural safety and hand gestures, showing most vision and/or language AI systems are very cross-culturally unsafe!

My aging brain is trying to sell me on crackpot etymologies: English "dog" has a famously obscure origin, but has anyone proposed or rejected Welsh "dwg"? This imperative meaning "get!" or "fetch!" already appears as a command to a dog in the Old Welsh lullaby, Pais Dinogad.

Down a research rabbit hole, I’m reading the 1st page of an 1863 novel, *The Story of the Guard* where the narrator describes the arrival of magazines, books, & newspapers to their "quiet & remote" life thus: "Our brain-rations came twice a month" Brain. Rations. archive.org/details/stor...

Call for papers: Quotation Practices in News Media across Time, Formats, and Cultures fileshare.uibk.ac.at/f/76c7f13371...

A recording of a talk I gave at the Institute for Analytical Sociology at Linköping University, which is a pretty good summary of the "AI + humanities" work I completed last year. I loved that this audience had so many questions!

new OCR tools from the OLMo team open, as god would have intended, did it exist

🌟Job ad🌟 We (@gregdnlp.bsky.social, @mattlease.bsky.social and I) are hiring a postdoc fellow within the CosmicAI Institute, to do galactic work with LLMs and generative AI! If you would like to push the frontiers of foundation models to help solve myths of the universe, please apply!

encourage postdocs to apply 👇 @soldaini.net, myself and others from @ai2.bsky.social have been helping in project & also learning a ton---continued pretraining, creating domain-specific training data & evals---to build foundation models that scientists can use. promising area for open source LMs!

More bad news for conference travel.

New preprint! Metaphors shape how people understand politics, but measuring them (& their real-world effects) is hard. We develop a new method to measure metaphor & use it to study dehumanizing metaphor in 400K immigration tweets Link: bit.ly/4i3PGm3 #NLP #NLProc #polisky #polcom #compsocialsci 🐦🐦

I heard the NEA is only considering proposals related to 1776 & the USA’s 250th birthday & I was feeling intensely patriotic, so I’m printing these really big posters inspired *directly* by the Founding Fathers The cut was carved from the masthead of a 19th century newspaper

Today we launch a new open research community It is called ARBOR: arborproject.github.io/ please join us. bsky.app/profile/ajy...

New England hoisting the banner of zeugma!

I’m thinking about my dad, who for almost his entire career was a civil servant and worked in logistics. He had, only semi-ironically, a medal of Chester A. Arthur on the wall of his office. Other than that, although he had political convictions, he did not display them in public.

CRA statement about NSF firings cra.org/cuts-to-nsf-...

There's been a lot of work on "culture" in NLP, but not much agreement on what it is. A position paper by me, @dbamman.bsky.social, and @ibleaman.bsky.social on cultural NLP: what we want, what we have, and how sociocultural linguistics can clarify things. Website: naitian.org/culture-not-... 1/n

Are you interested in cultural transmission, medieval manuscripts or digital humanities, and want to pursue a PhD in a city bustling with intellectual and cultural life ? Come work with us !

NEW: The National Science Foundation fired nearly 170 workers this morning. They include people who had already cleared their one-year probationary period only to have it changed to two years earlier this month. @kimzetter.bsky.social reports for @wired.com www.wired.com/story/nation...

This was such a fun project to work on! We release efficient classifiers 🌐 to partition large corpora, and use them to improve sampling for LLM pretraining great work lead by @awettig.bsky.social 👇

This looks like a fantastic postdoc opportunity at Cornell, slated to work with a wonderful team. They're looking for someone who does English/cultural studies + DH, with emphasis on computational work.

Our paper didn't just SOTA on arXiv this week, it got a Test of Time award.

Kill them all. AI will figure it out.

3 speculations about cultural heritage funding: 1. Reporting on threats to the non science agencies like NEH/NEA/IMLS will be scant. 2. There WILL be explicit zero-out threats to them, but not in first wave: they're ignored in the Center for Renewing America budget doc AFAICT. 3. LOC is insulated.

17 employees helped save the Getty Villa from the fires. They're telling other museums how it was done 📜 laist.com/news/arts-an...

Sabotage. Criminal sabotage. "[The NSF] is planning to lay off between a quarter and a half of its staff in the next two months, a top National Science Foundation official said Tuesday." www.eenews.net/articles/sci...

"Democracy demands wisdom and vision in its citizens. It must therefore foster and support a form of education, and access to the arts & humanities, designed to make people of all backgrounds and wherever located masters of their technology--not its unthinking servants." --from 1965 act creating NEH

I release my first attempts at training a base model with GRPO. In a similar spirit to R0, this colab notebook transforms Pleias-350m into an RL poet without any post-training data, using only reward functions. t.co/tYSp8NYI1s

DeepSeek R1 shows how important it is to be studying the internals of reasoning models. Try our code: Here @canrager.bsky.social shows a method for auditing AI bias by probing the internal monologue. dsthoughts.baulab.info I'd be interested in your thoughts.

We are searching for a new senior Program Manager for the cutting edge Network Science doctoral program @northeasternu.bsky.social! Come join us. northeastern.wd1.myworkdayjobs.com/careers/job/...

New @ai2.bsky.social paper accepted to #NAACL2025! We worked with Teaching Lab and ASSISTments to create a new dataset for evaluating VLMs. DrawEduMath includes 2,030 teacher-annotated images of students’ handwritten responses to math problems. Website: drawedumath.org

Not just the National Institutes of Health, now the National Science Foundation is put on pause. Reminder: research is a multi-step process. Pausing the peer review panels by itself creates months of delay.

Calling all digital classicists, humanists, annotators and various antiquists! Semantic Annotation for the Ancient World Conference in Rethymno and online - deadline February 1st! talos-ai4ssh.uoc.gr/events-page/...