Profile avatar
emollick.bsky.social
Professor at Wharton, studying AI and its implications for education, entrepreneurship, and work. Author of Co-Intelligence. Book: https://a.co/d/bC2kSj1 Substack: https://www.oneusefulthing.org/ Web: https://mgmt.wharton.upenn.edu/profile/emollick
1,290 posts 29,358 followers 145 following
Regular Contributor
Active Commenter

Lots of neat stuff in this paper showing 30% of US python commits use AI As of the end of 2024: “the annual value of AI-assisted coding in the United States at $9.6−14.4 billion, rising to 64−96 billion if we assume higher estimates of productivity effects reported by randomized control trials”

Interesting attempt by Salesforce to create a benchmark for realistic business tasks - we need more of these! Worth tracking over time as new models come out (though I would love to see a contest, ARC-AGI style, to ask people to try to beat these benchmarks and see if they can with prompts & tools)

The 80% price drop for OpenAI's o3 came with no performance trade-offs. It joins the increasing evidence (like the average per-query energy cost of just .00034 kWh) that the cost of serving ChatGPT queries is lower than a lot of people have speculated.

This was less than almost every estimate I have seem: according to the latest Sam Altman post, the average ChatGPT query uses about the same amount of power as the average Google search in 2009 (the last time they released a per-search number)… 0.0003 kWh

Been playing with o3-pro for a bit before it launched. It is pretty "smart." One problem it solved where every other model has failed is making word ladder from SPACE to EARTH. (Probably not contamination: the answer is different than the only online answer, which is for EARTH to SPACE in any case)

New RCT shows a familiar result on LLMs & medicine: Doctors given clinical vignettes produce significantly more accurate diagnoses when they also consult with a custom GPT built with the (obsolete) GPT-4 than doctors with Google/Pubmed but not AI. Yet AI alone is as accurate as doctors + AI.

🚨We have a new prompting report: Prompting a model with Chain of Thought is a common prompt engineering technique, but we find simple Chain-of-Thought prompts generally don’t help recent frontier LLMs, including reasoning & non-reasoning models, perform any better (but do increase time & costs)

Voice cloning is now trivially easy with open source AI tools that run on a PC, while live avatar videos of real people are easy with proprietary tools & a variety of open source tools are getting there. Very limited time to adjust legal & financial safeguards to new ways of authenticating people.

The new voice model from ElevenLabs is interesting & surprising. I put it against one of the hardest pieces for reading aloud - the final verse of Eliot's Wasteland, which uses four languages, a nursery rhyme & abrupt changes in tone. It required a few attempts to get, but this was pretty good.

Example of why I think current LLMs are enough to change lots of work even if they don’t get better, once we start integrating them with other systems GPT-4 (now obsolete) went from 30% accuracy to 87% accuracy in clinical oncology decisions when given access to tools www.nature.com/articles/s43...

"Claude 4 Opus, build an elaborate game that makes it feel like I'm a brilliant chess player without knowing anything at all about chess. It should make me feel like I'm a grand master. Feel free to go as meta as you want."

If you read that “Diabolus Ex Machina” post, worth noting that the fact that AI systems cannot reliably follow links & instead just make up content is a long-standing one dating back to the original ChatGPT 👇 It should be fixed as it is a common, bad failure mode. (Though better models now do this)

Early evidence shows AI has big educational potential, but more in-class tests is a reasonable response to AI cheating risks as we figure out what to do next Low-stakes testing is a powerful learning (not just assessment) tool. Tests help you remember better, access unrelated knowledge & learn more

AI use is ubiquitous & leads to performance gains at the individual level that are not passed on to organizations. In a representative survey of US workers, 43.2% now use generative AI at work. Those who do use it for 1/3 of their weekly tasks & report a tripling of productivity on those tasks.

I wrote a history of recent AI development in 32 images of otters using wifi on airplanes, from images to video to code. It shows two big trends: rapid improvements in AI models of all types and the growth of open weights AI models. www.oneusefulthing.org/p/the-recent...

After consideration, I will post occasionally, but heavily censor what I share compared to other sites. I tried making the transition, but talking about AI here is just really fraught in ways that are tough to mitigate & make it hard to have good discussions (the point of social!). Maybe it changes