Profile avatar
stellaathena.bsky.social
I make sure that OpenAI et al. aren't the only people who are able to study large scale AI systems.
248 posts 5,081 followers 347 following
Regular Contributor
Active Commenter

After a short era in which people questioned the value of academia in ML, its value is more obvious than ever. Big labs stopped publishing the minute commercial incentives showed up and are relentlessly focused on a singular vision of scaling. Academia is a meaningful complement, bringing... 1/2

It was great to speak to @chronphilanthropy.bsky.social about AI research and how it's challenging yet necessary for non-profits to work in the LLM space. Many think we should leave the field to companies, but non-profits have different values and goals and those are important. shorturl.at/8IlmF

Proud to be at the AI Action Summit representing @eleutherai.bsky.social and the open source community. The focus on AI for the public good is exciting! DM me or @aviya.bsky.social to talk about centering openness, transparency, and public good in the AI ecosystem.

In case you're curious how much of a hellscape X is. I opened it today to get greeted with a porn ad. The censored section is a 20 second video clip of a woman sucking a dick, with audio. It autoplays.

I have a new favorite academic journal.

Discussions of AI training dynamics and alignment are often underpinned by formal or informal appeals to a probability distribution over models. If you've thought about such an argument, this is a must read. It's not often you get to say prior work is off by "millions of orders of magnitude"!

I enjoy the big ML conferences *and* ACL *and* COLM so I have at least 8 deadlines a year and should never have to crunch. In reality I have 8 deadlines a year and I’m in a state of perpetual crunch.

The CDC's political leadership is editing language that they dislike out of academic papers. The primary purpose and de facto consequence of this will be the massive reduction if not complete elimination of discussion of queer people from CDC publications. insidemedicine.substack.com/p/breaking-n...

Is anyone well-read in the DS-R1 tea leaves and feels confidant they know what the distillation method used was? It's not clear to me if they mean "train on data from another model" or something that I'd consider "actually distilling"? My current guess is synthetic CoT?

The American response to DeepSeek falsifies the claims that America shouldn't be an open source leader because China will just take it and build on it. If that was true, the response would be "lol thanks for giving us free results, idiot."

This is a crime. Trump is breaking the law. The NSF is violating its contractual obligations. And because of those crimes committed by the government, researchers will not get paid and on-going scientific research projects will be ruined. You can't just "pause" access to grants.

Obligatory "actually my lab invented test-time-compute" post. In "Stay on topic with Classifier-Free Guidance," we show that CFG enables a model to expend twice as much compute at inference time and match the performance of a model twice as large. arxiv.org/abs/2306.17806

If you've automated much of your code-writing, what models do you use to write code? What does your workflow look like? I haven't found one I'm too happy with (but also haven't been trying that hard)