Last week I had an interesting conversation on how to set goals for http request latency: 1) Measure network latency client side (browser or app) rather than at the edge. 2) Set a P75 goal rather than P95 or P99 e.g. P75 <= 250 ms 3) Set a P99/P75 goal e.g. P99/P75 <= 6 Let's look into why: - ThreadSky

p-y.wtf • 7 days ago

Last week I had an interesting conversation on how to set goals for http request latency:

1) Measure network latency client side (browser or app) rather than at the edge.

2) Set a P75 goal rather than P95 or P99

e.g. P75 <= 250 ms

3) Set a P99/P75 goal

e.g. P99/P75 <= 6

Let's look into why:

Comments

liutikas.net•7 days ago

Are you only measuring successful calls in these stats? As in dropping timeouts and retries

p-y.wtf•7 days ago

Only success yes

thibaultleouay.dev•7 days ago

Synthetic is ok to give you an understanding

But it is still server to server

There's nothing like rum 🤗

bidetofevil.wtf•7 days ago

Are you tracking success rate? With the denominator being any time a network request is enqueued?

p-y.wtf•7 days ago

This is only successful requests. Thoughts?

bidetofevil.wtf•7 days ago

It's important to get the real denominator bc regressions can cause folks to give up before the request is executed, so you may not see a p75 change simply bc a bunch of requests at the tail are abandoned.

The real indicator of a regression is success rate dropping. Slow is bad - abandoned is worse

p-y.wtf•7 days ago

❤️ Good point, just added that. This was largely driven by Charlie's experience btw, sounds like you two worked together :)

bidetofevil.wtf•7 days ago

Haha yeah we chatted. If you can do one basic thing, it's to quantify attempts to do a thing vs how many times it succeeded. Ideally, each data point can be a singular event (or span) you can add context to slice and dice, but just counting them up is better than nothing.

p-y.wtf•7 days ago

Measuring network latency client side ensures we truly account for the actual customer experience, including e.g. cost of SSL connexion or any proxying that we might be missing.

p-y.wtf•7 days ago

Client side network can be unstable, especially on a mobile network. P95 / P99 typically corresponds to poor network conditions that we don't have much control over.

Hence the choice of a P75 goal.

p-y.wtf•7 days ago

However, those poor network conditions tend to apply somewhat similarly & linearly to all network calls.

So looking for a high P99/P75 ratio helps surface endpoints that scale poorly, e.g. endpoints that do N+1 queries or aren't paginated and return large lists.

Comments

Posting Rules

Reply