emollick.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

emollick.bsky.social

Professor at Wharton, studying AI and its implications for education, entrepreneurship, and work. Author of Co-Intelligence. Book: https://a.co/d/bC2kSj1 Substack: https://www.oneusefulthing.org/ Web: https://mgmt.wharton.upenn.edu/profile/emollick

1,290 posts 29,358 followers 145 following

Posts 16 Comments 34

Lots of neat stuff in this paper showing 30% of US python commits use AI As of the end of 2024: “the annual value of AI-assisted coding in the United States at $9.6−14.4 billion, rising to 64−96 billion if we assume higher estimates of productivity effects reported by randomized control trials”

submitted 9 hours ago • 3 comments

Interesting attempt by Salesforce to create a benchmark for realistic business tasks - we need more of these! Worth tracking over time as new models come out (though I would love to see a contest, ARC-AGI style, to ask people to try to beat these benchmarks and see if they can with prompts & tools)

submitted 11 hours ago • 5 comments

The 80% price drop for OpenAI's o3 came with no performance trade-offs. It joins the increasing evidence (like the average per-query energy cost of just .00034 kWh) that the cost of serving ChatGPT queries is lower than a lot of people have speculated.

submitted 22 hours ago • 3 comments

This was less than almost every estimate I have seem: according to the latest Sam Altman post, the average ChatGPT query uses about the same amount of power as the average Google search in 2009 (the last time they released a per-search number)… 0.0003 kWh

submitted 2 days ago • 5 comments

Been playing with o3-pro for a bit before it launched. It is pretty "smart." One problem it solved where every other model has failed is making word ladder from SPACE to EARTH. (Probably not contamination: the answer is different than the only online answer, which is for EARTH to SPACE in any case)

submitted 2 days ago • 4 comments

New RCT shows a familiar result on LLMs & medicine: Doctors given clinical vignettes produce significantly more accurate diagnoses when they also consult with a custom GPT built with the (obsolete) GPT-4 than doctors with Google/Pubmed but not AI. Yet AI alone is as accurate as doctors + AI.

submitted 3 days ago • 5 comments

🚨We have a new prompting report: Prompting a model with Chain of Thought is a common prompt engineering technique, but we find simple Chain-of-Thought prompts generally don’t help recent frontier LLMs, including reasoning & non-reasoning models, perform any better (but do increase time & costs)

submitted 4 days ago • 6 comments

Voice cloning is now trivially easy with open source AI tools that run on a PC, while live avatar videos of real people are easy with proprietary tools & a variety of open source tools are getting there. Very limited time to adjust legal & financial safeguards to new ways of authenticating people.

submitted 5 days ago • 9 comments

The new voice model from ElevenLabs is interesting & surprising. I put it against one of the hardest pieces for reading aloud - the final verse of Eliot's Wasteland, which uses four languages, a nursery rhyme & abrupt changes in tone. It required a few attempts to get, but this was pretty good.

submitted 5 days ago • 5 comments

Example of why I think current LLMs are enough to change lots of work even if they don’t get better, once we start integrating them with other systems GPT-4 (now obsolete) went from 30% accuracy to 87% accuracy in clinical oncology decisions when given access to tools www.nature.com/articles/s43...

submitted 6 days ago • 5 comments

"Claude 4 Opus, build an elaborate game that makes it feel like I'm a brilliant chess player without knowing anything at all about chess. It should make me feel like I'm a grand master. Feel free to go as meta as you want."

submitted 7 days ago • 5 comments

If you read that “Diabolus Ex Machina” post, worth noting that the fact that AI systems cannot reliably follow links & instead just make up content is a long-standing one dating back to the original ChatGPT 👇 It should be fixed as it is a common, bad failure mode. (Though better models now do this)

submitted 8 days ago • 7 comments

Early evidence shows AI has big educational potential, but more in-class tests is a reasonable response to AI cheating risks as we figure out what to do next Low-stakes testing is a powerful learning (not just assessment) tool. Tests help you remember better, access unrelated knowledge & learn more

submitted 9 days ago • 4 comments

AI use is ubiquitous & leads to performance gains at the individual level that are not passed on to organizations. In a representative survey of US workers, 43.2% now use generative AI at work. Those who do use it for 1/3 of their weekly tasks & report a tripling of productivity on those tasks.

submitted 10 days ago • 2 comments

I wrote a history of recent AI development in 32 images of otters using wifi on airplanes, from images to video to code. It shows two big trends: rapid improvements in AI models of all types and the growth of open weights AI models. www.oneusefulthing.org/p/the-recent...

submitted 11 days ago • 18 comments

After consideration, I will post occasionally, but heavily censor what I share compared to other sites. I tried making the transition, but talking about AI here is just really fraught in ways that are tough to mitigate & make it hard to have good discussions (the point of social!). Maybe it changes

submitted 17 days ago • 76 comments