High-performance flash storage like NVMe SSDs are becoming commonplace in production environments. In this age of fast storage, inefficiencies in storage software stacks that were previously acceptable with slow spinning devices, are no longer tolerable. There are even faster technologies, like NVDIMMs, on the horizon. These developments are particularly challenging for distributed storage solutions like Gluster and Ceph, where network latencies are a fact of life.
This talk is on performance analysis and troubleshooting tools and techniques for this age.
In this talk, we will share some of our experiences in driving performance improvements in gluster to enable it to better deliver the
performance of fast hardware devices to applications that need it. We will cover our performance analysis methodology and some of the common problem areas we see as we try to scale in performance. We will discuss some of the tools available for troubleshooting performance bottlenecks, like perf and mutrace. But tools alone are insufficient to troubleshoot performance issues and we will share some of the techniques that we employ to get the needed insights from our tools. We will present examples for how instrumentation in the code is sometimes invaluable in identifying performance issues.