Profile avatar
davidduvenaud.bsky.social
Machine learning prof at U Toronto. Working on evals and AGI governance.
48 posts 873 followers 134 following
Getting Started
Active Commenter
comment in response to post
It’ll be co-located with ICML. Our workshop is a separate event, so no need to register for ICML to attend ours! Ours is free but invite-only, please apply on our site: www.post-agi.org Co-organized with Raymond Douglas, Nora Ammann, @kulveit.bsky.social, and @davidskrueger.bsky.social
comment in response to post
- Are there multiple, qualitatively different basins of attraction of future civilizations? - Do Malthusian conditions necessarily make it hard to preserve uncompetitive, idiosyncratic values? - What empirical evidence could help us tell which trajectory we’re on?
comment in response to post
Some empirical questions we hope to discuss: - Could alignment of single AIs to single humans be sufficient to solve global coordination problems? - Will agency tend to operate at ever-larger scales, multiple scales, or something else?
comment in response to post
Some concrete topics we hope to address: -What future trajectories are plausible? -What mechanisms could support long-term legacies? -New theories of agency, power, and social dynamics. -AI representatives and new coordination mechanisms. -How will AI alter cultural evolution?
comment in response to post
And Anna Yelizarov, @fbarez.bsky.social, @scasper.bsky.social, Beatrice Erkers, among others. We'll draw from political theory, cooperative AI, economics, mechanism design, history, and hierarchical agency.
comment in response to post
Thanks for explaining, but I'm still confused. LLMs succeed regularly at following complex natural-language instructions without examples - it's their bread and butter. I agree they sometimes have problems executing algorithms consistently (unless fine-tuned to do so), but so do untrained humans.
comment in response to post
"only those individuals who explicitly understood a task (via a natural language explanation) reached a correct solution whereas implicit trial and error reinforcement failed to converge. This ... has yet to be demonstrated in an LLM." Is this claiming LLMs haven't been shown to benefit from hints?
comment in response to post
Thanks for clarifying. I agree that singulatarian scenarios can be naive, breathless, and simplistic. But this piece seems to me to overstate its case if it's plausible AI will make most humans unemployable. I'd love to hear your thoughts about life after most work is automated, if you have time.
comment in response to post
At that point, redistribution would be a life-or-death matter, and also would be disincentivized by competition within and between states. In your story, a Deus Ex Machina saved the protagonist, but I don't have a clear picture of what a realistic equilibrium would look like. Do you?
comment in response to post
I'm confused why you're confident the downsides will be manageable. As you depicted in The Discrete Charm of the Turing Machine, even just being able to copy the best machine performers (near the level of the best humans) would make almost every human unable to compete, permanently.
comment in response to post
We realize lots of people have worked on these before, or are already working on them now! We just wanted to list the main directions we're excited about that still seem wide open. We're all ears for suggestions!
comment in response to post
10. Understand AI Agency. What does the world look like when there are 100,000 exact copies of yourself? When you can design bespoke sub-agents or formally commit to following a policy? It’s not even clear what the natural unit of identity is for an AI.
comment in response to post
9. Simulate entire civilizations! Using LLMs, we can run tests on entire (simplified) civilizations. This can be a proxy for emergent human phenomena like cultural development, and could help characterize possible AI civilizations.
comment in response to post
8. AI Complementarity. Most work is aimed at AI agents that fully replace humans, partly because it’s easier to get fast feedback. Can we build benchmarks or evaluations that reward supporting humans? Are there other ways to nudge things more towards augmentation?
comment in response to post
7. Civilizational alignment and hierarchical agency. We might be able to model parts of civilizational dynamics: like game theory or information theory, but able to explain phenomena like the rise of some religions or historical instability of even the most powerful regimes.
comment in response to post
6. Differential Progress. Some techs might extend human agency: - Superhuman mediation, bargaining, and arbitration - Privacy-preserving disclosure - Collective decision-making mechanisms - Provable neutrality Developing these public goods might delay gradual disempowerment.
comment in response to post
5. Interaction with other dynamics. What happens when gradual disempowerment, misalignment, recursive self-improvement, coups, and other dynamics all play out at once? We should study their likely interactions, and tradeoffs in mitigations. www.lesswrong.com/posts/6kBMqr...
comment in response to post
4. Robustness of the basics of society. Property rights might preserve human influence. But they haven’t always! Let’s study the stability of things like currency, human influence, and basic rights. We can model when these become unstable, in theory and historically.
comment in response to post
3. Study Historical Parallels. There have been major tech-driven power transfers before: The Meiji Restoration, the fall of English aristocracy, the printing press, and the Trail of Tears. What can such examples tell us about how things might play out? www.lesswrong.com/posts/bmmFLo...
comment in response to post
2. Clarify the goal. What are the feasible good futures? Can idiosyncratic values survive in the face of competition? Post-AGI, how could humans have influence? What kind of peaceful relationships could we even in principle have with AGIs?
comment in response to post
1. Respond to counter-arguments. We got lots of thoughtful pushback: Could we just live off index funds? Can comparative advantage save us? Would aligned AI solve this? It’d be valuable to clarify the best version of these arguments, and under what circumstances they hold water.
comment in response to post
Hmmm, I agree it might be harder to tell if someone is secretly trying to undermine your argument rather than learn from you. But if my opponents are forced to consider and respond to a sensible form of my position and clarify their own position, that seems like a win for everybody, no?
comment in response to post
Thanks for explaining, makes sense.
comment in response to post
Cool work, and I appreciate that the protocol is simple, but it sounds like overkill to only transmit best actions. What kind of agent can update in a calibrated way on someone else's best action, while not being able to express their own vector of expectations?
comment in response to post
Thanks. Care to share a link?
comment in response to post
Link to arXiv version: arxiv.org/pdf/2501.16946
comment in response to post
We hope we’re wrong! If you think so, or you have a plan, please articulate it! We wrote this paper because we want people to discuss the future with eyes wide open. Solving this will probably require help from economists, historians, mathematicians, political theorists, etc.
comment in response to post
As for what to do about it: @richardngo.bsky.social has been pitching the idea of AI-enabled “unprecedentedly trustworthy institutions” as a way to navigate the tough governance challenges we face: x.com/RichardMCNgo...
comment in response to post
Dan Hendrycks argued that evolutionary pressures generally favor selfish species, likely including future AIs, and that this may lead to human extinction. Our paper talks about the likely dynamics of this process. arxiv.org/abs/2303.16200
comment in response to post
@robinhanson.bsky.social described a similar scenario in “Age of Em”, where bio humans are simply too slow to compete. And he also recently pointed out the dangers of cultural drift at a civilization scale. e.g.: www.overcomingbias.com/p/how-fix-cu...
comment in response to post
@critch.bsky.social described ``a gradual handing-over of control from humans to AI systems, driven by competitive pressures for institutions to … preferentially engag[e] with other fully automated companies.’’ www.lesswrong.com/posts/Kobbt3...
comment in response to post
@akorinek.bsky.social and nobel laureate @josephestiglitz.bsky.social suggest that AGI might reintroduce Malthusian dynamics: AI that could replace human labor could make basic human necessities unaffordable to humans, leaving humans too weak to preserve property rights: www.nber.org/system/files...
comment in response to post
Paul Christiano (head of the US AI Safety Institute) described ‘a slow-rolling catastrophe’ where humans can’t effectively oversee a machine economy: www.alignmentforum.org/posts/HBxe6w...
comment in response to post
Others have made similar points before. Some work that inspired us:
comment in response to post
Things we don’t think will be sufficient to safeguard human interests: - Current alignment plans - Universal basic income - Comparative advantage - Accrued capital + property rights - Asking to keep Earth as a bio-human nature preserve
comment in response to post
What can be done to avoid gradual human disempowerment? We don’t know, but improving group reasoning and decision-making while humans still have influence is probably a net benefit. Our ability to even perceive, predict, and coordinate could be much better.
comment in response to post
We think this is a predictable way in which major labs and governments’ current plans to address AI risk will fail. To the extent they even address catastrophic risk, they mainly focus on takeover and misuse rather than the systemic problems we describe here.
comment in response to post
Still, wouldn't humans notice what's happening and coordinate to stop it? We argue that it’ll be surprisingly hard to coordinate once we start being displaced, because our culture and governance will change in ways untethered to human interests.
comment in response to post
Once humans aren’t necessary for economic growth, our institutions and states will have incentives to systematically displace humans, even if everyone involved would prefer not to. The fragile alignment of state interests with human interests will further weaken.
comment in response to post
Decision-makers at all levels will soon face pressures to reduce human involvement across labor markets, governance structures, cultural production, and even social interactions. Those who resist these pressures will eventually be displaced by those who do not.
comment in response to post
Loss of human influence will be centrally driven by having more competitive machine alternatives to humans in economic labor, decision making, cultural creation, and even companionship.
comment in response to post
A gradual loss of control of our own civilization might sound implausible. Hasn't technological disruption usually improved aggregate human welfare? We argue that this was only because of the necessity of human participation for thriving economies, states, and cultures.
comment in response to post
The major takeaways: 1) No one has a concrete plausible plan for stopping gradual human disempowerment. 2) Aligning individual AI systems with their designers’ intentions is not sufficient. This is because our civilization and institutions aren’t robustly aligned with humans.