As the industry moves towards more cloud based and containerised solutions such as Kubernetes, monitoring tools have to keep up. These new environments are far more dynamic than the hand-maintained machines of old, requiring more sophisticated and scalable approaches. This talk will look at how Prometheus has evolved over the past 5 years to be better able to cope with these challenges, including the 2.0 release and practices that we encourage in a cloud native world.
This talk will cover the dynamic nature of cloud-based services, and how approaches that worked when everything had a dedicated machine no longer work out both technically and in terms of human operational effort. This includes the dynamic nature of cloud-based environments where there can be high churn due to releases potentially several times an hour, heterogeneous performance of replicas and complex micro-service dependency graphs. There will be a quick overview of the history of the TSDB of Prometheus, giving context to the new storage engine in the 2.0 release and how it helps with the above technical problems. Following that will be a look at service discovery to find what to monitor in dynamic environments, the aggregation abilities of PromQL and how these can be combined to perform symptom rather than cause based alerting using techniques such as the RED method.
Speakers: Brian Brazil