A lot of folks are starting to talk about o11y 2.0 and how it's different -- dev vs ops, pillars vs log events, etc. Which is cool But there is only ONE core distinction between 1.0 and 2.0; everything else is a consequence.
http://dlvr.it/TGVl0h
http://dlvr.it/TGVl0h
Comments
Tracing is the answer but it's hard to get folks to shift.
If your tool treats logs as a linear series of events and doesn't allow adhoc aggregations and filtering, you'll struggle to see the benefit (your 2 tools don't do this well unfortunately).
I know I'm trying to shoe horn something that possibly isn't the right fit but trying in ways to push things forward.
I'm just struggling to get my head around it with the current setup and if it can bring value.
And here's an offer... if you want to do a guest blog about your experience, I can get that increased while we work through it together and help everyone understand the benefit of changing how they think.
If you treat logs as a linear series of events in code, and then don't see that in the backend, that can be jarring.
At both, I was able to work my way into positions to be able to make configuration and even some dev changes to our tooling, but never enough to make a difference.
Then we piped all those dashboard events directly into incident management tools, shipped those tickets offshore to be triaged poorly before coming back to us, and ended up with more noise and worse SLA misses than just staring at the dashboards!
Current job is doing these same things also wrong in slightly different ways, but I'm a few more steps removed from the event soup in my current role.
in this case, i'm just trying to help bring some clarity to a crowded space where a lot of folks are very confused. i hope the 3 pillars vs single source of truth explanation is helpful.
done well, technical language can help people organize their thoughts, find each other and build upon each other's practices.
if not o11y 2.0, then what do you suggest?
https://www.linkedin.com/feed/update/urn:li:activity:7267903124500107264/
But there are a few app-centered metrics that deserve to exist, like counters. I'll give you that. 😉
It means you can't e.g. sample the underlying events that include the full context and still see the full picture by having a higher level counter.