norootcause.surfingcomplexity.com - Profile | ThreadSky

norootcause.surfingcomplexity.com

Student of complex systems failures, resilience engineering, cognitive systems engineering. Will talk your ear off about @resilienceinsoftware.org

1,010 posts 1,454 followers 521 following

Posts 23 Comments 27

Can we put the AIs on-call yet?

submitted 4 hours ago • 2 comments

Hot take: a change freeze is itself a type of change

submitted 17 hours ago • 1 comment

The incident that just happened has exposed the most salient risks. But that’s not the same as the risks most likely to bite you next. If your action items crowd out ongoing work to address other risks, you might make things worse.

submitted 19 hours ago • 1 comment

Holy shit, this guy will NOT shut up about Fight Club.

submitted 8 days ago • 5 comments

New blog post: Not causal chains, but interactions and adaptations surfingcomplexity.blog/2025/05/19/n...

submitted 1 day ago • 0 comments

Tradeoffs, tradeoffs everywhere

submitted 2 days ago • 0 comments

Weird how something can happen and several months later something completely unrelated happens. Really makes you think.

submitted 2 days ago • 6 comments

Right now, your system is broken in a thousand different ways that you don’t even know about, without obvious symptoms. One fo those breakages led to your last incident. A different existing breakage will lead to your next incident.

submitted 2 days ago • 1 comment

Schools are on the cutting edge!

submitted 2 days ago • 0 comments

I hear “we need to teach kids AI in school” and, I dunno, man, sounds to me like the problem is that these kids have learned all too well how to use AI in a school context.

submitted 2 days ago • 2 comments

Another fun exercise to think through: what are the things that your incident responders are most likely going to struggle with during the next high-severity incident?

submitted 2 days ago • 1 comment

Incident metric I’d like to see: how aware were we, in advance of the incident, about the risk that manifested as that incident? For each incident, assign a numerical rating (e.g., 0-100) of a priori risk awareness, and then look at how well your organization actually understands its risks.

submitted 2 days ago • 2 comments

Tell the story of the incident from multiple perspectives. Multiple people were involved, and they each have a different view of what happened.

submitted 2 days ago • 2 comments

The system and the environment

submitted 2 days ago • 0 comments

In my experience, if you ask someone “how are things going reliability-wise in your company?”, the response generally falls into one of two buckets: 1. 🔥 Everything’s on fire! 2. 🤷 We don’t actually know how things are going

submitted 2 days ago • 1 comment

I’ll start believing that the RCA approach to incident analysis provides genuine insight into the nature of complex systems failures when users of it stop being surprised by the fact that they keep getting surprised by incidents.

submitted 2 days ago • 2 comments

Why do they call them surgeons and not operators?

submitted 3 days ago • 2 comments

It’s wild that queueing theory is a whole research field. “You know the whole ‘waiting in line’ thing? Like, in the grocery store? What if we went super-deep into that?”

submitted 3 days ago • 2 comments

You know those work meetings that you would rather not attend but feel obligated to show up to? Imagine being able to send an AI ambassador to these meetings in your stead. Eventually it’s probably just all AIs in the meeting.

submitted 3 days ago • 5 comments

FLAT EARTHER: drops a bunch of change and that’s how god created our solar system

submitted 4 days ago • 3 comments

Who called it insomnia and not resisting a rest

submitted 3 days ago • 8 comments

Have any journal editors been caught using LLMs as reviewers yet? How would we ever even find out?

submitted 3 days ago • 1 comment

Folks, why aren’t we using LLMs for generating schedule estimates for development work??? We all loathe making those estimates, and we can just blame the LLM if the actual development time deviates from the estimate. “The LLM must have hallucinated a bad estimate”.

submitted 3 days ago • 8 comments