stevebyrnes.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

POV: You unfollowed every bsky account that posts too much stressful content about the Trump administration

submitted 23 hours ago

comment in response to post

POV: You unfollowed every bsky account that posts too much stressful content about the Trump administration

submitted 23 hours ago

comment in response to post

I’m more pessimistic than you about the feasibility of “WBE without understanding” for reasons here → www.lesswrong.com/posts/wByPb6... . I’m also less pessimistic about sufficient understanding for WBE, see last paragraph here ↓

submitted 3 days ago

comment in response to post

…So, when LLMs “talk about themselves”, we learn SOMETHING from what they say, but it needs to be interpreted carefully. I’m not making any particular point, just, I find this a helpful framing that I find myself bringing up in conversation from time to time. (2/2)

submitted 3 days ago

comment in response to post

if only the people saying “if only those people had studied history, then they would have avoided such mistakes” had studied history, then they would have avoided such mistakes

submitted 19 days ago

comment in response to post

Jumping right into Bulverism, there’s a common tendency to treat ego-dystonic “urges” as coming from human brain RL and ego-syntonic “desires” as coming from our ethereal souls via free will. I’ve written about why that mistake feels so intuitive, see here: www.lesswrong.com/posts/7tNq4h... 3/3

submitted 27 days ago

comment in response to post

Whenever you see a human ignoring short-term pleasure for long-term goals or values, you can thank human brain RL for that, just as much as you can thank human brain RL for when it happens the other way around. 2/3

submitted 27 days ago

comment in response to post

Today’s RL algorithms are usually not great at long-term planning in complex environments, mainly because long-term planning in complex env’s is a hard problem. E.g. combinatorial explosion of possibilities. (So much the worse for today’s RL algorithms!) But I don’t think that relates to humans 1/3

submitted 27 days ago

comment in response to post

Yeah, the current “JargonBot” has hover-over pop-up definitions, and that was the LaTeX plan / idea too. Not sure about chat interface. I feel like lesswrong dev people were talking about things like that too, but they haven’t shipped anything, I dunno the details.

submitted 27 days ago

comment in response to post

Related(ish): A lesswrong dev was working towards using LLMs for mouse-over definitions within LaTeX formulas in blog posts. He started by releasing a simpler but related feature—auto-defining jargon in plain text. Not sure what the status is for the LaTeX thing. www.lesswrong.com/posts/sZvMLW...

submitted 27 days ago

comment in response to post

I imagine you’ve already seen these, but I for one enjoyed Joe Carlsmith’s pair of essays related to that: (1) “Why should ethical anti-realists do ethics?” joecarlsmith.com/2023/02/16/w... (2) “Seeing more whole” joecarlsmith.com/2023/02/17/s...

submitted 37 days ago

comment in response to post

Funny! Claude Sonnet gets these correct though. Do you know what the google “AI overview” model is?

submitted 41 days ago

comment in response to post

OK, so lesswrong.com is a mix of (1) a blogging platform (same genre as wordpress) and (2) a community forum (same genre as a subreddit) with, umm, idiosyncratic focus. For the blogging platform part, they built their own blog-writing software. The margin thing is just one of the built-in features.

submitted 45 days ago

comment in response to post

Again, here’s the link to the post! I’m learning as I go, happy for feedback & discussion! Thanks @ent3c.bsky.social for your book, which was exceptionally clear and helpful, even if I wound up disagreeing with parts of it. (7/7) www.lesswrong.com/posts/xXtDCe...

submitted 45 days ago

comment in response to post

Fifth, the context of trying to understand some outcome (schizophrenia, extroversion, or whatever) by studying the genes that correlate with it. I argue that this activity is useful on the margin and can be done well, despite the humiliating “candidate gene” fiasco of a couple decades ago. (6/7)

submitted 45 days ago

comment in response to post

Fourth, the context of PGS & the Missing Heritability Problem. I’m a big advocate for “epistasis”, i.e. a nonlinear map from genome to (some but not all) outcomes. I argue that epistasis in human outcomes is widely misunderstood even by experts. (5/7)

submitted 45 days ago

comment in response to post

Third, the context of assessing whether YOU should try hard to be your best self, or whether “I shouldn’t bother because my fate is determined by my genes”. (Spoiler: it’s the former!) (4/7)

submitted 45 days ago

comment in response to post

Second, the context of parenting decisions. I bring up what I call “the bio-determinist child-rearing rule-of-thumb”, why we should believe it, and its broader lessons for how to think about childhood … AND the many important cases where it DOESN’T apply!! (3/7)

submitted 45 days ago

comment in response to post

First, the context of guessing someone’s likely adult traits (disease risk, personality, …) based on their family history and childhood environment. That leads to twin and adoption studies. How do they work, what are the assumptions, what does “E” *really* mean, etc.? (2/7)

submitted 45 days ago

comment in response to post

There’s a lyric in Frozen 2: “You feel what you feel, and those feelings are real”. I repeat it all the time, probably every couple days. Context is usually something like telling a kid “you’re allowed to feel annoyed at me, as long as you’re following house rules”. Things like that.

submitted 51 days ago

comment in response to post

Probably not the kind of “detailed analysis” you’re looking for but I casually discuss background considerations & intuitions in §3.2-3.3 of www.alignmentforum.org/posts/LJD4C7... Also, long back-and-forth w Matt Clancy ending here: x.com/steve47285/s... (This is about future AI not Jan 2025 AI)

submitted 51 days ago

comment in response to post

I think you’d like this series of 8 essays, on the project of getting from neuroscience algorithms to introspective self-reports. (Very illusionism-adjacent! See §1.6.) “Self” comes up in almost every post, including exotic manifestations like trance & dissociation. www.lesswrong.com/posts/FtwMA5...

submitted 64 days ago

comment in response to post

I’ve been working for years full-time on an Artificial General Intelligence Safety research program very related to that idea! ↓ (from www.lesswrong.com/posts/kYvbHC... )

submitted 68 days ago

comment in response to post

You say “overreacting”, and your headline says “moderately skeptical”, but when I skimmed it I didn’t really see that much disagreement 🤔 Separately, “invasive species” type risks were discussed in the report too, even if they weren’t as emphasized as “disease” type risks. Right?

submitted 70 days ago

comment in response to post

I could go on. None of these is PROOF that AGI will cause human extinction, obviously! And I could make a long list of optimistic “demos” too. But I do think there are some people simply forgetting about possible problems, and maybe one or two of those “demos” would be helpful brainstorming. 16/16

submitted 72 days ago

comment in response to post

Here is a “demo” that it’s possible for a future technology to feel impossibly, laughably far away from being invented, when it’s actually mere months away. en.wikipedia.org/w/index.php?... 15/16

submitted 72 days ago

comment in response to post

And here is a “demo” that it’s possible for a small number of such highly competent agents to maneuver their way into dictatorial control over a much much larger population of humans. www.lesswrong.com/posts/ivpKSj... 14/16

submitted 72 days ago

comment in response to post

Here is a “demo” that the arrival of a new kind of highly competent agent with the capacity to invent technology, coordinate at scale, self-reproduce, etc., is a big friggin’ deal. en.wikipedia.org/wiki/Human_i... 13/16

submitted 72 days ago

comment in response to post

Here is a “demo” that it’s possible for there to be a global catastrophe causing millions of deaths and trillions of dollars of damage, and then immediately afterwards everyone goes back to not even taking trivial measures to prevent similar or worse catastrophes from recurring. 12/16

submitted 72 days ago

comment in response to post

Here is a “demo” that it’s possible for companies to ignore or suppress obvious future problems when they would interfere with immediate profits. 11/16

submitted 72 days ago

comment in response to post

Every week we get more “demos” that, if next-token prediction is insufficient to make powerful autonomous AGIs that can accomplish long-term goals via out-of-the-box strategies, then people will keep searching for other approaches that CAN do that. 10/16

submitted 72 days ago

comment in response to post

Here is a “demo” that, given a tradeoff between AI transparency (English-language chain-of-thought) and AI capability (inscrutable chain-of-thought but the results are better), many people will choose the latter, and pat themselves on the back for a job well done. arxiv.org/abs/2412.06769 9/16

submitted 72 days ago

comment in response to post

…but here is a “demo” that it’s possible for people to do experiments that threaten the whole world, despite a long track record of direct evidence that this exact thing is a threat to the whole world, wildly out of proportion to its benefit, and that governments may even fund them. 8/16

submitted 72 days ago

comment in response to post

Here’s a “demo” that if you give random people access to an AI, one of them might ask it to destroy humanity, just to see what would happen. Granted, I think this person had justified confidence that this particular AI would fail to destroy humanity… decrypt.co/126122/meet-... 7/16

submitted 72 days ago

comment in response to post

(And here’s a “demo” that at least one powerful tech company executive might be fine with AGI wiping out humanity anyway.) observer.com/2024/06/argu... 6/16

submitted 72 days ago

comment in response to post

Here’s a “demo” that it’s possible for a large AGI development project to decide that even TRYING to make nice docile AGIs is ALREADY overkill, because AGIs will just automatically be nice, again for reasons that don’t stand up to scrutiny. www.lesswrong.com/posts/ixZLTm... 5/16

submitted 72 days ago

comment in response to post

Here’s a “demo” that it’s possible for a large active AGI development project to have a technical plan that’s claimed to create nice docile AGIs, but the plan would actually make callous sociopath AGIs, and fixing it is an unsolved problem: www.alignmentforum.org/posts/C5guLA... 4/16

submitted 72 days ago

comment in response to post

(But first, a common misconception is incorrectly thinking that we’re worried about human extinction from current AIs, rather than future AIs. See my handy FAQ) www.lesswrong.com/posts/uxzDLD... 3/16

submitted 72 days ago

comment in response to post

In particular, I think lots of people are over-optimistic about the situation because they’re not thinking realistically about people & institutions. For them, a “demo” that GPT-o1 generates output X from prompt Y, Z% of the time, is unsurprising & irrelevant. But there are lots of other “demos”! 2/

submitted 72 days ago

comment in response to post

The updates since the first version last year are in a changelog at the bottom. Mostly minor: crisper explanations, more examples, etc. But I also deleted an ingredient from the pseudocode box; see changelog for why. Here’s the link again! www.lesswrong.com/posts/7kdBqS... (6/6)

submitted 73 days ago

comment in response to post

This hypothesis should be experimentally testable. Next step is probably a retrograde neural tracer study from the PAG cell groups recently studied by Gloveli et al. (5/6)

submitted 73 days ago

comment in response to post

For the brain level: I think it’s this pseudocode ↓ In the post, I discuss at length why the pseudocode is compatible with the evolutionary “spec”, and how the pseudocode is consistent with everyday experience, including physical play, conversational laughter, and humor. (4/6)

submitted 73 days ago