Correlation analysis in automated testing

FOSDEM 2020

Correlation Analysis is a statistical method that is used to discover if there is a relationship between two variables, and how strong that relationship might be. A correlation coefficient is a numerical measure of such correlation. According to the Cauchy–Schwarz inequality it has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation. One of the axioms of automated testing is that tests are independent and in spite of that correlation coefficient should be equal to 0. But often it isn't. In this work, we are going to present a method of evaluation of tests suites quality based on correlation coefficient and finding their weak points. Using PC Engines open-source firmware regression test results, which are based on over 140 automated tests run with 2 flavors of software on 4 different platforms, we will show how its quality can be described numerically, and how that results can be used to optimize test criteria.

As far as automated testing is considered all the tests can have only two expected output values - pass or fail. Originally Pearson's correlation coefficient is the covariance of the two variables divided by the product of their standard deviations - the first question was how to do it for Boolean variables. We assumed that the only value that matters can be a failure of a test. During the lecture, we will present how mathematical analysis can reveal potential flaws in test criteria by targeting cases that have a large chance to fail simultaneously.

Speakers: Łukasz Wcisło