stevebyrnes.bsky.social
Researching Artificial General Intelligence Safety, via thinking about neuroscience and algorithms, at Astera Institute. https://sjbyrnes.com/agi.html
138 posts
2,762 followers
82 following
Getting Started
Active Commenter
comment in response to
post
POV: You unfollowed every bsky account that posts too much stressful content about the Trump administration
comment in response to
post
POV: You unfollowed every bsky account that posts too much stressful content about the Trump administration
comment in response to
post
I’m more pessimistic than you about the feasibility of “WBE without understanding” for reasons here → www.lesswrong.com/posts/wByPb6... . I’m also less pessimistic about sufficient understanding for WBE, see last paragraph here ↓
comment in response to
post
…So, when LLMs “talk about themselves”, we learn SOMETHING from what they say, but it needs to be interpreted carefully.
I’m not making any particular point, just, I find this a helpful framing that I find myself bringing up in conversation from time to time. (2/2)
comment in response to
post
if only the people saying “if only those people had studied history, then they would have avoided such mistakes” had studied history, then they would have avoided such mistakes
comment in response to
post
Jumping right into Bulverism, there’s a common tendency to treat ego-dystonic “urges” as coming from human brain RL and ego-syntonic “desires” as coming from our ethereal souls via free will. I’ve written about why that mistake feels so intuitive, see here: www.lesswrong.com/posts/7tNq4h... 3/3
comment in response to
post
Whenever you see a human ignoring short-term pleasure for long-term goals or values, you can thank human brain RL for that, just as much as you can thank human brain RL for when it happens the other way around. 2/3
comment in response to
post
Today’s RL algorithms are usually not great at long-term planning in complex environments, mainly because long-term planning in complex env’s is a hard problem. E.g. combinatorial explosion of possibilities. (So much the worse for today’s RL algorithms!) But I don’t think that relates to humans 1/3
comment in response to
post
Yeah, the current “JargonBot” has hover-over pop-up definitions, and that was the LaTeX plan / idea too.
Not sure about chat interface. I feel like lesswrong dev people were talking about things like that too, but they haven’t shipped anything, I dunno the details.
comment in response to
post
Related(ish): A lesswrong dev was working towards using LLMs for mouse-over definitions within LaTeX formulas in blog posts. He started by releasing a simpler but related feature—auto-defining jargon in plain text. Not sure what the status is for the LaTeX thing. www.lesswrong.com/posts/sZvMLW...
comment in response to
post
I imagine you’ve already seen these, but I for one enjoyed Joe Carlsmith’s pair of essays related to that: (1) “Why should ethical anti-realists do ethics?”
joecarlsmith.com/2023/02/16/w... (2) “Seeing more whole” joecarlsmith.com/2023/02/17/s...
comment in response to
post
Funny! Claude Sonnet gets these correct though. Do you know what the google “AI overview” model is?
comment in response to
post
OK, so lesswrong.com is a mix of (1) a blogging platform (same genre as wordpress) and (2) a community forum (same genre as a subreddit) with, umm, idiosyncratic focus.
For the blogging platform part, they built their own blog-writing software. The margin thing is just one of the built-in features.
comment in response to
post
Again, here’s the link to the post! I’m learning as I go, happy for feedback & discussion! Thanks @ent3c.bsky.social for your book, which was exceptionally clear and helpful, even if I wound up disagreeing with parts of it. (7/7) www.lesswrong.com/posts/xXtDCe...
comment in response to
post
Fifth, the context of trying to understand some outcome (schizophrenia, extroversion, or whatever) by studying the genes that correlate with it. I argue that this activity is useful on the margin and can be done well, despite the humiliating “candidate gene” fiasco of a couple decades ago. (6/7)
comment in response to
post
Fourth, the context of PGS & the Missing Heritability Problem. I’m a big advocate for “epistasis”, i.e. a nonlinear map from genome to (some but not all) outcomes. I argue that epistasis in human outcomes is widely misunderstood even by experts. (5/7)
comment in response to
post
Third, the context of assessing whether YOU should try hard to be your best self, or whether “I shouldn’t bother because my fate is determined by my genes”. (Spoiler: it’s the former!) (4/7)
comment in response to
post
Second, the context of parenting decisions. I bring up what I call “the bio-determinist child-rearing rule-of-thumb”, why we should believe it, and its broader lessons for how to think about childhood … AND the many important cases where it DOESN’T apply!! (3/7)
comment in response to
post
First, the context of guessing someone’s likely adult traits (disease risk, personality, …) based on their family history and childhood environment. That leads to twin and adoption studies. How do they work, what are the assumptions, what does “E” *really* mean, etc.? (2/7)
comment in response to
post
There’s a lyric in Frozen 2: “You feel what you feel, and those feelings are real”. I repeat it all the time, probably every couple days. Context is usually something like telling a kid “you’re allowed to feel annoyed at me, as long as you’re following house rules”. Things like that.
comment in response to
post
Probably not the kind of “detailed analysis” you’re looking for but I casually discuss background considerations & intuitions in §3.2-3.3 of www.alignmentforum.org/posts/LJD4C7...
Also, long back-and-forth w Matt Clancy ending here: x.com/steve47285/s...
(This is about future AI not Jan 2025 AI)
comment in response to
post
I think you’d like this series of 8 essays, on the project of getting from neuroscience algorithms to introspective self-reports. (Very illusionism-adjacent! See §1.6.) “Self” comes up in almost every post, including exotic manifestations like trance & dissociation. www.lesswrong.com/posts/FtwMA5...
comment in response to
post
I’ve been working for years full-time on an Artificial General Intelligence Safety research program very related to that idea! ↓ (from www.lesswrong.com/posts/kYvbHC... )
comment in response to
post
You say “overreacting”, and your headline says “moderately skeptical”, but when I skimmed it I didn’t really see that much disagreement 🤔
Separately, “invasive species” type risks were discussed in the report too, even if they weren’t as emphasized as “disease” type risks. Right?
comment in response to
post
I could go on. None of these is PROOF that AGI will cause human extinction, obviously! And I could make a long list of optimistic “demos” too. But I do think there are some people simply forgetting about possible problems, and maybe one or two of those “demos” would be helpful brainstorming. 16/16
comment in response to
post
Here is a “demo” that it’s possible for a future technology to feel impossibly, laughably far away from being invented, when it’s actually mere months away. en.wikipedia.org/w/index.php?... 15/16
comment in response to
post
And here is a “demo” that it’s possible for a small number of such highly competent agents to maneuver their way into dictatorial control over a much much larger population of humans. www.lesswrong.com/posts/ivpKSj... 14/16
comment in response to
post
Here is a “demo” that the arrival of a new kind of highly competent agent with the capacity to invent technology, coordinate at scale, self-reproduce, etc., is a big friggin’ deal. en.wikipedia.org/wiki/Human_i... 13/16
comment in response to
post
Here is a “demo” that it’s possible for there to be a global catastrophe causing millions of deaths and trillions of dollars of damage, and then immediately afterwards everyone goes back to not even taking trivial measures to prevent similar or worse catastrophes from recurring. 12/16
comment in response to
post
Here is a “demo” that it’s possible for companies to ignore or suppress obvious future problems when they would interfere with immediate profits. 11/16
comment in response to
post
Every week we get more “demos” that, if next-token prediction is insufficient to make powerful autonomous AGIs that can accomplish long-term goals via out-of-the-box strategies, then people will keep searching for other approaches that CAN do that. 10/16
comment in response to
post
Here is a “demo” that, given a tradeoff between AI transparency (English-language chain-of-thought) and AI capability (inscrutable chain-of-thought but the results are better), many people will choose the latter, and pat themselves on the back for a job well done. arxiv.org/abs/2412.06769 9/16
comment in response to
post
…but here is a “demo” that it’s possible for people to do experiments that threaten the whole world, despite a long track record of direct evidence that this exact thing is a threat to the whole world, wildly out of proportion to its benefit, and that governments may even fund them. 8/16
comment in response to
post
Here’s a “demo” that if you give random people access to an AI, one of them might ask it to destroy humanity, just to see what would happen. Granted, I think this person had justified confidence that this particular AI would fail to destroy humanity… decrypt.co/126122/meet-... 7/16
comment in response to
post
(And here’s a “demo” that at least one powerful tech company executive might be fine with AGI wiping out humanity anyway.) observer.com/2024/06/argu... 6/16
comment in response to
post
Here’s a “demo” that it’s possible for a large AGI development project to decide that even TRYING to make nice docile AGIs is ALREADY overkill, because AGIs will just automatically be nice, again for reasons that don’t stand up to scrutiny. www.lesswrong.com/posts/ixZLTm... 5/16
comment in response to
post
Here’s a “demo” that it’s possible for a large active AGI development project to have a technical plan that’s claimed to create nice docile AGIs, but the plan would actually make callous sociopath AGIs, and fixing it is an unsolved problem: www.alignmentforum.org/posts/C5guLA... 4/16
comment in response to
post
(But first, a common misconception is incorrectly thinking that we’re worried about human extinction from current AIs, rather than future AIs. See my handy FAQ) www.lesswrong.com/posts/uxzDLD... 3/16
comment in response to
post
In particular, I think lots of people are over-optimistic about the situation because they’re not thinking realistically about people & institutions. For them, a “demo” that GPT-o1 generates output X from prompt Y, Z% of the time, is unsurprising & irrelevant. But there are lots of other “demos”! 2/
comment in response to
post
The updates since the first version last year are in a changelog at the bottom. Mostly minor: crisper explanations, more examples, etc. But I also deleted an ingredient from the pseudocode box; see changelog for why. Here’s the link again! www.lesswrong.com/posts/7kdBqS... (6/6)
comment in response to
post
This hypothesis should be experimentally testable. Next step is probably a retrograde neural tracer study from the PAG cell groups recently studied by Gloveli et al. (5/6)
comment in response to
post
For the brain level: I think it’s this pseudocode ↓
In the post, I discuss at length why the pseudocode is compatible with the evolutionary “spec”, and how the pseudocode is consistent with everyday experience, including physical play, conversational laughter, and humor. (4/6)