Are You Testing Your Observability? Patterns for Instrumenting Your Services

FOSDEM 2020

Observability is the key to understand how your application runs and behaves in action. This is especially true for distributed environments like Kubernetes, where users run Cloud-Native microservices.

Among many other observability signals like logs and traces, the metrics signal has a substantial role. Sampled measurements observed throughout the system are crucial for monitoring the health of the applications and, they enable real-time, actionable alerting. While there are many open-source robust libraries, in various languages, that allow us to easily instrument services for backends like Prometheus, there are still numerous possibilities to make a mistake or misuse those tools.

During this talk, two engineers from Red Hat: Kemal and Bartek (Prometheus and Thanos project maintainer) will discuss valuable patterns and best practices for instrumenting your application. The speakers will go through common pitfalls and failure cases while sharing valuable insights and methods to avoid those mistakes. In addition, this talk will demonstrate, how to leverage unit testing to verify the correctness of your observability signals. How it helps and why it is important. Last but not least, the talk will cover a demo of the example instrumented application based on the experience and projects we maintain.

The audience will leave knowing how to answer the following important questions:

What are the essential metrics that services should have? Should you test your observability? What are the ways to test it on a unit-test level? What are the common mistakes while instrumenting services and how to avoid them?

And more!

The end goal of this talk is to demonstrate to the audience, how to harvest the powers of metric-based instrumentation in their applications. We would like to share some pragmatic, best practices and common patterns that we learned while maintaining several open-source projects.

During this talk:

We will discuss valuable patterns and best practices for instrumenting libraries and applications. We will go through a set of common pitfalls failure cases, and methods to avoid those mistakes. Some of the topics we plan to mention: common cardinality issues, summaries vs histograms, choosing histogram bucket, testing, instrumenting libraries vs applications, common middlewares etc We will demonstrate, why, when and how to leverage unit testing to verify your observability signals. We plan to present a demo of the example instrumented application. We plan to use Go as an example language of such application but the talk should be mostly language agnostic.

Speakers: Bartek Plotka Kemal Akkoyun