davidduvenaud.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

It’ll be co-located with ICML. Our workshop is a separate event, so no need to register for ICML to attend ours! Ours is free but invite-only, please apply on our site: www.post-agi.org Co-organized with Raymond Douglas, Nora Ammann, @kulveit.bsky.social, and @davidskrueger.bsky.social

submitted 1 day ago

comment in response to post

- Are there multiple, qualitatively different basins of attraction of future civilizations? - Do Malthusian conditions necessarily make it hard to preserve uncompetitive, idiosyncratic values? - What empirical evidence could help us tell which trajectory we’re on?

submitted 1 day ago

comment in response to post

Some empirical questions we hope to discuss: - Could alignment of single AIs to single humans be sufficient to solve global coordination problems? - Will agency tend to operate at ever-larger scales, multiple scales, or something else?

submitted 1 day ago

comment in response to post

Some concrete topics we hope to address: -What future trajectories are plausible? -What mechanisms could support long-term legacies? -New theories of agency, power, and social dynamics. -AI representatives and new coordination mechanisms. -How will AI alter cultural evolution?

submitted 1 day ago

comment in response to post

And Anna Yelizarov, @fbarez.bsky.social, @scasper.bsky.social, Beatrice Erkers, among others. We'll draw from political theory, cooperative AI, economics, mechanism design, history, and hierarchical agency.

submitted 1 day ago

comment in response to post

Thanks for explaining, but I'm still confused. LLMs succeed regularly at following complex natural-language instructions without examples - it's their bread and butter. I agree they sometimes have problems executing algorithms consistently (unless fine-tuned to do so), but so do untrained humans.

submitted 2 days ago

comment in response to post

"only those individuals who explicitly understood a task (via a natural language explanation) reached a correct solution whereas implicit trial and error reinforcement failed to converge. This ... has yet to be demonstrated in an LLM." Is this claiming LLMs haven't been shown to benefit from hints?

submitted 3 days ago

comment in response to post

Thanks for clarifying. I agree that singulatarian scenarios can be naive, breathless, and simplistic. But this piece seems to me to overstate its case if it's plausible AI will make most humans unemployable. I'd love to hear your thoughts about life after most work is automated, if you have time.

submitted 6 days ago

comment in response to post

At that point, redistribution would be a life-or-death matter, and also would be disincentivized by competition within and between states. In your story, a Deus Ex Machina saved the protagonist, but I don't have a clear picture of what a realistic equilibrium would look like. Do you?

submitted 6 days ago

comment in response to post

I'm confused why you're confident the downsides will be manageable. As you depicted in The Discrete Charm of the Turing Machine, even just being able to copy the best machine performers (near the level of the best humans) would make almost every human unable to compete, permanently.

submitted 6 days ago

comment in response to post

We realize lots of people have worked on these before, or are already working on them now! We just wanted to list the main directions we're excited about that still seem wide open. We're all ears for suggestions!

submitted 16 days ago

comment in response to post

10. Understand AI Agency. What does the world look like when there are 100,000 exact copies of yourself? When you can design bespoke sub-agents or formally commit to following a policy? It’s not even clear what the natural unit of identity is for an AI.

submitted 16 days ago

comment in response to post

9. Simulate entire civilizations! Using LLMs, we can run tests on entire (simplified) civilizations. This can be a proxy for emergent human phenomena like cultural development, and could help characterize possible AI civilizations.

submitted 16 days ago

comment in response to post

8. AI Complementarity. Most work is aimed at AI agents that fully replace humans, partly because it’s easier to get fast feedback. Can we build benchmarks or evaluations that reward supporting humans? Are there other ways to nudge things more towards augmentation?

submitted 16 days ago

comment in response to post

7. Civilizational alignment and hierarchical agency. We might be able to model parts of civilizational dynamics: like game theory or information theory, but able to explain phenomena like the rise of some religions or historical instability of even the most powerful regimes.

submitted 16 days ago

comment in response to post

6. Differential Progress. Some techs might extend human agency: - Superhuman mediation, bargaining, and arbitration - Privacy-preserving disclosure - Collective decision-making mechanisms - Provable neutrality Developing these public goods might delay gradual disempowerment.

submitted 16 days ago

comment in response to post

5. Interaction with other dynamics. What happens when gradual disempowerment, misalignment, recursive self-improvement, coups, and other dynamics all play out at once? We should study their likely interactions, and tradeoffs in mitigations. www.lesswrong.com/posts/6kBMqr...

submitted 16 days ago

comment in response to post

4. Robustness of the basics of society. Property rights might preserve human influence. But they haven’t always! Let’s study the stability of things like currency, human influence, and basic rights. We can model when these become unstable, in theory and historically.

submitted 16 days ago

comment in response to post

3. Study Historical Parallels. There have been major tech-driven power transfers before: The Meiji Restoration, the fall of English aristocracy, the printing press, and the Trail of Tears. What can such examples tell us about how things might play out? www.lesswrong.com/posts/bmmFLo...

submitted 16 days ago

comment in response to post

2. Clarify the goal. What are the feasible good futures? Can idiosyncratic values survive in the face of competition? Post-AGI, how could humans have influence? What kind of peaceful relationships could we even in principle have with AGIs?

submitted 16 days ago

comment in response to post

1. Respond to counter-arguments. We got lots of thoughtful pushback: Could we just live off index funds? Can comparative advantage save us? Would aligned AI solve this? It’d be valuable to clarify the best version of these arguments, and under what circumstances they hold water.

submitted 16 days ago

comment in response to post

Hmmm, I agree it might be harder to tell if someone is secretly trying to undermine your argument rather than learn from you. But if my opponents are forced to consider and respond to a sensible form of my position and clarify their own position, that seems like a win for everybody, no?

submitted 92 days ago

comment in response to post

Thanks for explaining, makes sense.

submitted 123 days ago

comment in response to post

Cool work, and I appreciate that the protocol is simple, but it sounds like overkill to only transmit best actions. What kind of agent can update in a calibrated way on someone else's best action, while not being able to express their own vector of expectations?

submitted 123 days ago

comment in response to post

Thanks. Care to share a link?

submitted 139 days ago

comment in response to post

Link to arXiv version: arxiv.org/pdf/2501.16946

submitted 140 days ago

comment in response to post

We hope we’re wrong! If you think so, or you have a plan, please articulate it! We wrote this paper because we want people to discuss the future with eyes wide open. Solving this will probably require help from economists, historians, mathematicians, political theorists, etc.

submitted 140 days ago

comment in response to post

As for what to do about it: @richardngo.bsky.social has been pitching the idea of AI-enabled “unprecedentedly trustworthy institutions” as a way to navigate the tough governance challenges we face: x.com/RichardMCNgo...

submitted 140 days ago

comment in response to post

Dan Hendrycks argued that evolutionary pressures generally favor selfish species, likely including future AIs, and that this may lead to human extinction. Our paper talks about the likely dynamics of this process. arxiv.org/abs/2303.16200

submitted 140 days ago

comment in response to post

@robinhanson.bsky.social described a similar scenario in “Age of Em”, where bio humans are simply too slow to compete. And he also recently pointed out the dangers of cultural drift at a civilization scale. e.g.: www.overcomingbias.com/p/how-fix-cu...

submitted 140 days ago

comment in response to post

@critch.bsky.social described ``a gradual handing-over of control from humans to AI systems, driven by competitive pressures for institutions to … preferentially engag[e] with other fully automated companies.’’ www.lesswrong.com/posts/Kobbt3...

submitted 140 days ago

comment in response to post

@akorinek.bsky.social and nobel laureate @josephestiglitz.bsky.social suggest that AGI might reintroduce Malthusian dynamics: AI that could replace human labor could make basic human necessities unaffordable to humans, leaving humans too weak to preserve property rights: www.nber.org/system/files...

submitted 140 days ago

comment in response to post

Paul Christiano (head of the US AI Safety Institute) described ‘a slow-rolling catastrophe’ where humans can’t effectively oversee a machine economy: www.alignmentforum.org/posts/HBxe6w...

submitted 140 days ago

comment in response to post

Others have made similar points before. Some work that inspired us:

submitted 140 days ago

comment in response to post

Things we don’t think will be sufficient to safeguard human interests: - Current alignment plans - Universal basic income - Comparative advantage - Accrued capital + property rights - Asking to keep Earth as a bio-human nature preserve

submitted 140 days ago

comment in response to post

What can be done to avoid gradual human disempowerment? We don’t know, but improving group reasoning and decision-making while humans still have influence is probably a net benefit. Our ability to even perceive, predict, and coordinate could be much better.

submitted 140 days ago

comment in response to post

We think this is a predictable way in which major labs and governments’ current plans to address AI risk will fail. To the extent they even address catastrophic risk, they mainly focus on takeover and misuse rather than the systemic problems we describe here.

submitted 140 days ago

comment in response to post

Still, wouldn't humans notice what's happening and coordinate to stop it? We argue that it’ll be surprisingly hard to coordinate once we start being displaced, because our culture and governance will change in ways untethered to human interests.

submitted 140 days ago

comment in response to post

Once humans aren’t necessary for economic growth, our institutions and states will have incentives to systematically displace humans, even if everyone involved would prefer not to. The fragile alignment of state interests with human interests will further weaken.

submitted 140 days ago

comment in response to post

Decision-makers at all levels will soon face pressures to reduce human involvement across labor markets, governance structures, cultural production, and even social interactions. Those who resist these pressures will eventually be displaced by those who do not.

submitted 140 days ago

comment in response to post

Loss of human influence will be centrally driven by having more competitive machine alternatives to humans in economic labor, decision making, cultural creation, and even companionship.

submitted 140 days ago

comment in response to post

A gradual loss of control of our own civilization might sound implausible. Hasn't technological disruption usually improved aggregate human welfare? We argue that this was only because of the necessity of human participation for thriving economies, states, and cultures.

submitted 140 days ago

comment in response to post

The major takeaways: 1) No one has a concrete plausible plan for stopping gradual human disempowerment. 2) Aligning individual AI systems with their designers’ intentions is not sufficient. This is because our civilization and institutions aren’t robustly aligned with humans.

submitted 140 days ago