The world in which we monitor software is growing more complex every year. There are increasingly more ways to run server-side software, with many more independent services and more points of failures, the list goes on! On the plus side, there’s a lot of great tools and patterns being developed to try and make things simple to assess and understand. This talk covers how metrics and monitoring can be leveraged in a variety of different ways, auto-discovering applications and their usage of databases, caches, load balancers, etc, setting up and tearing down dashboards and monitoring automatically for services and instances, and more.
We’ll also talk about how you can accomplish all this with a global view of your systems using both Prometheus and Graphite with M3, our open source metrics platform. We’ll take a deep dive look at how we use M3DB, distributed aggregation with the M3 aggregator and the M3 Kubernetes operator to horizontally scale a metrics platform in a way that doesn’t cost outrageous amounts to run with a system that’s still sane to operate with petabytes of metrics data.