Profile avatar
saxon.me
NLP/Vision+Language PhD Candidate @ UCSB Evals, metrics, multilinguality, multiculturality, multimodality, and (dabbling in) reasoning https://saxon.me/
233 posts 2,640 followers 667 following
Regular Contributor
Active Commenter

broooo what if some of those that work forces are the same that burn crosses

This piece rhetorically asks: "Should the climate movement start demanding that everyone stop listening to Spotify? Would that be a good use of our time?" unfortunately I think many would say 'yes'. andymasley.substack.com/p/individual...

"The cost of living has increased, but the cost of owning has increased more" says the rent hike letter from a landlord who had funded 0 repairs since I've lived here, in CA with frozen property tax

A cheeseburger uses a lot more water than a ChatGPT request πŸ” Actual farms, not the data center variety, are sucking up groundwater more quickly than surface water, explains @markgongloff.bsky.social πŸŽ₯

Proposed to cut number of people involved in NSF activities by 70%. We are literally on the chopping board. Call your reps.

To be honest, I kinda love grok? (when it isn't being Elonbotomized to be a racism machine) So many rightoid maniacs query it expecting to see their conspiracist beliefs echoed back at them only to repeatedly get gently corrected with factual information lmao

I cannot stop thinking about Andor. Masterpiece, must watch for pretty much everyone imo

Sent my thesis in to my committee this week, will defend June 2 at 1pm PT! If you're interested in catching it on zoom, here's a calendar link! calendar.google.com/calendar/u/0...

Despite clickbaity title this is a great level-headed piece from a real scientist who tried working in AI for science. The key point that AI is a tool not an all encompassing revolution is common sense but the details are interesting and illuminating open.substack.com/pub/understa...

If we just add a few more annoying tasks for authors and a few more for reviewers we can fix peer review in AI!

According to a 2021 report, the University of California system: β€’ generated $82B in economic activity in California β€’ supported 529K jobs in the state β€’ generated $21 in economic output for every $1 received Public divestment from higher ed makes no sense, even in the narrowest economic terms.

Michael News! I will be joining the Tech Policy Lab at the University of Washington @ischool.uw.edu and UW NLP working with @aylincaliskan.bsky.social as a postdoc in the fall, to work on situated evaluation, multimodal/lingual/cultural genAI, and new directions in safety, fairness, and alignment!

"Women are PIs on 58% of the canceled grants, although they are PIs on only 34% of all active NSF grants. Similarly, Blacks are PIs on 17% of the terminated grants, although they make only 4% of the total pool. Hispanic PIs and those with disabilities were twice as likely to lose a grant."

There's no escape! Even in my sister's bar admission ceremony the bar president starts talking about AI 🀣

We were interviewed for IEEE spectrum about reasoning models! spectrum.ieee.org/chain-of-tho...

What is it about the City of Berkeley and Country of England that makes interest in AI safety and weirder fringe stuff like AI consciousness so prevalent? Like why are these topics so big there and not in like Seattle or Pittsburgh??

Finally a study mix I can get behind www.youtube.com/watch?v=0tR5...

"LLM on way to replace doctors" gets published in Nature. meanwhile "LLM judgement not as good as human MDs" gets a spot in "Physical Therapy and Rehabilitation Journal".

Very interesting oral history -- interviews with some top NLP folks on the effects of GenAI on their field: www.quantamagazine.org/when-chatgpt...

We won an outstanding paper award!! 2025.naacl.org/blog/best-pa...

PSA for NAACL peeps from a southwest boi (sadly I won't be there): be sure to find a place to eat New Mexico style stacked enchiladas. You can get it "Christmas style" where its served with both red and green hatch chile. The hatch chile is integral, do not skip. Not photogenic, but very delicious

I wondered if it could really be all that bad from the beginning, after all users are signing up to publicly interact with each other on a forum but woof, I don't think I would have signed off on this broad of a "the LM is allowed to impersonate this" policy

So basically, there was a Signal chat with tech folks and some Harpers Letter writers, and the Harpers folk were chased out when Andreessen realized they would not go along with censorship. But the tech guys stuck with Chris Rufo. www.semafor.com/article/04/2...

"Man it's sad that every single one of these trailers is a franchise sequel we're never gonna get an original movie again are we" he said, sitting in the theater to watch a rerelease of Revenge of the Sith

My stomach dropped when I saw the amount of quotes and replies... and a lot of the replies are about as aggressive and facile as I expected. note to self don't use the phrase "a n t i - A I" in a post lol

Always remember to thank your LM, it's a common courtesy like tipping your landlord

Evaluating language model responses on open-ended tasks is hard! πŸ€” We introduce EvalAgent, a framework that identifies nuanced and diverse criteria πŸ“‹βœοΈ. EvalAgent identifies πŸ‘©β€πŸ«πŸŽ“ expert advice on the web that implicitly address the user’s prompt πŸ§΅πŸ‘‡