Solid OpenAI post-mortem on @kubernetes.io API overload, caused by "per-node telemetry ingestion" status.openai.com/incidents/ct... 🤔 Oddly familiar to a common DaemonSet @prometheus.io / @opentelemetry.io gotcha for metrics scrapes, we talked about in the past: youtu.be/yk2aaAyxgKw?... 🙈 - ThreadSky

About ThreadSky

bwplotka.dev • 185 days ago

Solid OpenAI post-mortem on @kubernetes.io API overload, caused by "per-node telemetry ingestion" https://status.openai.com/incidents/ctrsv3lwd797 🤔

Oddly familiar to a common DaemonSet @prometheus.io / @opentelemetry.io gotcha for metrics scrapes, we talked about in the past:
https://youtu.be/yk2aaAyxgKw?t=768 🙈

Comments

sszuecs.bsky.social•185 days ago

Wow there are so many faults:
data plane inside kubernetes relies on kube apiserver because of dns service discovery…
However prevention part looks promising.
In these cases I always see how safe our infrastructure that manages cluster actually is and monitoring that makes it easy to find offenders

bwplotka.dev•185 days ago

...there's even a OSS metric collection operator we maintain at Google with that optimization 🙃

https://github.com/GoogleCloudPlatform/prometheus-engine

Another solution is to ofc have a cluster service discovery that pushes the targets to node agents, but pure (optimized) daemonset usually scales enough!

bboreham.bsky.social•185 days ago

I had an idea in this area https://github.com/prometheus/prometheus/issues/15402

bwplotka.dev•185 days ago

Ah nice! Yea, we could make this a pattern. However it feels it would need to handle load balancing too (supporting multiple Prometheus-es), which touches "dynamic scrape target distribution problem" that is generally a bad idea (less reliable)

Somewhat relevant to https://opentelemetry.io/docs/kubernetes/operator/target-allocator/

bboreham.bsky.social•185 days ago

That’s the “have this process offer N sets of targets” part.

Posting Rules

Be respectful to others
No spam or self-promotion
Stay on topic
Follow Bluesky's terms of service

Comments

Posting Rules

Reply