A lot of folks are starting to talk about o11y 2.0 and how it's different -- dev vs ops, pillars vs log events, etc. Which is cool But there is only ONE core distinction between 1.0 and 2.0; everything else is a consequence. http://dlvr.it/TGVl0h - ThreadSky

charity.wtf • 88 days ago

A lot of folks are starting to talk about o11y 2.0 and how it's different -- dev vs ops, pillars vs log events, etc. Which is cool But there is only ONE core distinction between 1.0 and 2.0; everything else is a consequence.

http://dlvr.it/TGVl0h

Comments

neilgilbert.bsky.social•69 days ago

I'm struggling to get wide logs in DotNet. To keep the context throughout a request and log everything at the end, it just doesn't feel like a natural fit. We have a package that aggregates all log events into one but it feels a bit wrong.
Tracing is the answer but it's hard to get folks to shift.

neilgilbert.bsky.social•69 days ago

I'd really like to see examples of how people are doing this in DotNet with Logs (if at all)

neilgilbert.bsky.social•69 days ago

It doesn't help when the o11y tooling we use is geared towards logging over tracing (more functionality etc 🐕)

martindotnet.bsky.social•68 days ago

Canonical logs are a mindset shift on the visualisation side way more than they are on the generation side.

If your tool treats logs as a linear series of events and doesn't allow adhoc aggregations and filtering, you'll struggle to see the benefit (your 2 tools don't do this well unfortunately).

neilgilbert.bsky.social•67 days ago

I think I need to fire up some other tooling to see how it works.
I know I'm trying to shoe horn something that possibly isn't the right fit but trying in ways to push things forward.
I'm just struggling to get my head around it with the current setup and if it can bring value.

martindotnet.bsky.social•67 days ago

Remember, we're free for 20m events, and as you're on OTel it should be pretty to try us out.

And here's an offer... if you want to do a guest blog about your experience, I can get that increased while we work through it together and help everyone understand the benefit of changing how they think.

martindotnet.bsky.social•68 days ago

The original honeycomb Beeline i created back in 2018 did this, but not logs, it was an append only object you could add additional context to as you went.

If you treat logs as a linear series of events in code, and then don't see that in the backend, that can be jarring.

neilgilbert.bsky.social•67 days ago

It's the linear bit I think. We aggregate all the 'info', request/response logs into one. I'm not sure that's needed if you can group them by traceid or if that even falls under the definition of canonical logging.

charity.wtf•68 days ago

Oh boy, do I ever have the right person to help you. @martindotnet.bsky.social?

martindotnet.bsky.social•68 days ago

Hahaha... we've been discussing it in DM for a while!

charity.wtf•67 days ago

Ah, of course you have 😌🥂

abailly.bsky.social•86 days ago

In 2018, @abolibibelot.bsky.social and myself presented a talk at CodeMesh titled “One Log” which tried to make the point the log of events happening in your system was the one true source of observability of the system. Seems like the times are ripe for this to be embraced more widely

charity.wtf•69 days ago

Oh, neat!!! Yeah!!

typingloudly.zip•88 days ago

Having spent years doing infrastructure and security incident management for two large MSPs, observability is so interesting to me.

At both, I was able to work my way into positions to be able to make configuration and even some dev changes to our tooling, but never enough to make a difference.

typingloudly.zip•88 days ago

We started off staring at dashboards.

Then we piped all those dashboard events directly into incident management tools, shipped those tickets offshore to be triaged poorly before coming back to us, and ended up with more noise and worse SLA misses than just staring at the dashboards!

typingloudly.zip•88 days ago

*We also did a lot of monitoring of application events for telecom- and contact center related applications, which are notoriously noisy.

charity.wtf•86 days ago

Oh dear, lol. How long ago was this?

typingloudly.zip•86 days ago

Not that long ago. Left that job about 4 years ago and I'm sure they're still doing it mostly wrong.

Current job is doing these same things also wrong in slightly different ways, but I'm a few more steps removed from the event soup in my current role.

kaarolch.bsky.social•86 days ago

I dislike versioning of idea / concept because this is another way for marketing to sell product. Know 2-3 companies that build their O11y based on apps events about 6-8 years ago. They did it because it’s was natural way how dev team would like to see it and ops was busy with cloud migration:)

charity.wtf•86 days ago

i get it.. i'm pretty cynical about the way people invent words and categories in order to sell products, too.

in this case, i'm just trying to help bring some clarity to a crowded space where a lot of folks are very confused. i hope the 3 pillars vs single source of truth explanation is helpful.

charity.wtf•86 days ago

i'm definitely not trying to claim that it's new or that we invented it or anything! hell, @dyanacek.bsky.social was instrumenting AWS services this way 10 (15??) years ago.

done well, technical language can help people organize their thoughts, find each other and build upon each other's practices.

charity.wtf•86 days ago

the fact that teams have been independently discovering and rediscovering, inventing and reinventing the o11y 2.0 method of gathering and storing telemetry for decades, to me, indicates that we desperately need some common terminology to rally around.

if not o11y 2.0, then what do you suggest?

spanktar.bsky.social•86 days ago

I just say “Honeycomb” 🙂

kaarolch.bsky.social•86 days ago

Why do we need to define one single name for it :) A few weeks ago I spoke with one of sales folks from HC and I really appreciative his statement: we do it differently if you would like to get full power of honeycomb you need to change your mindset how you observed your app and focus on events

kaarolch.bsky.social•86 days ago

I dislike O11y 1.0, 2.0 because versioning could introduce „legacy” term. I still see place where people can successfully use 1.0 and don’t need 2.0 as well as see place when people can mix 1.0 and 2.0 or use 2.0 as you mentioned.

josh.sg•86 days ago

“Wide events” was the label that made it click for me. “O11y 2.0” made me think of versioning as well, but “wide events” or something like that conveys what you’re doing much more clearly

humainary.io•86 days ago

Embedding semantic constructs like “event” at the sensing and streaming layer introduces overhead, complexity, and inflexibility.

https://www.linkedin.com/feed/update/urn:li:activity:7267903124500107264/

dyanacek.bsky.social•85 days ago

I've always found it handy to be able to add some instrumentation in my code, and know that it'll show up as a dashboardable/alarmable metric with a predictable dimensionality (to balance cost with performance).

dyanacek.bsky.social•85 days ago

But to also know that it'll be recorded along with all my other attributes that I wrote down in the span so that I can slice and dice things in any way I want later on. So in short - yeah agreed about the wide event idea! And I do appreciate finding names to call these practices, so thanks for that!

michaelwilde.bsky.social•87 days ago

97% correct. The other 3% is infra metrics are a thing as platform eng/infra peeps are the formula 1 engineers of resources. Nothing wrong with looking at charts of non-transactional metrics. Can’t derive everything from events. (yet)

charity.wtf•86 days ago

Metrics aggregated around the health of the system fall pretty squarely in the o11y 1.0 bucket, IMO. Which is the overwhelming majority of metrics use cases.

But there are a few app-centered metrics that deserve to exist, like counters. I'll give you that. 😉

metalmatze.de•86 days ago

That's what I always come back to as well. We want some high level metrics - mostly counters - to create our SLOs on top of, right? Have you created SLOs on the wide events too?

michaelwilde.bsky.social•86 days ago

Yes. That’s actually what Honeycomb’s SLO product does. It’s SLI is based on actual events versus observed time aggregations that are taught in the Google SRE handbook. It’s a different level of detail, which is effective for many folks.

metalmatze.de•86 days ago

That's super cool! Yet, quite expensive if you HAVE to track every single event with full context, isn't it?
It means you can't e.g. sample the underlying events that include the full context and still see the full picture by having a higher level counter.

quail.wtf•86 days ago

If you tell honeycomb "this event represents 5 events that got dropped due to sampling" it will correct everything when calculating SLOs. The sample rate is also per-event, so you can e.g. keep all errors with a sample rate of 1, everything else is 1/5

michaelwilde.bsky.social•86 days ago

It’s mostly possible if you do tail based sampling (traces are the most popular thing to use for event based SLOs in HNY because spans are the only type of event that requires a duration). In fact, the economics of o11y SaaS pretty much require you to sample.

Comments

Posting Rules

Reply