Correlation Analysis is a statistical method that is used to discover if there
is a relationship between two variables, and how strong that relationship might
be. A correlation coefficient is a numerical measure of such correlation.
According to the Cauchy–Schwarz inequality it has a value between +1 and −1,
where 1 is total positive linear correlation, 0 is no linear correlation, and −1
is total negative linear correlation. One of the axioms of automated testing is
that tests are independent and in spite of that correlation coefficient should
be equal to 0. But often it isn't. In this work, we are going to present
a method of evaluation of tests suites quality based on correlation coefficient
and finding their weak points. Using PC Engines open-source firmware regression
test results, which are based on over 140 automated tests run with 2 flavors of
software on 4 different platforms, we will show how its quality can be described
numerically, and how that results can be used to optimize test criteria.
As far as automated testing is considered all the tests can have only two
expected output values - pass or fail. Originally Pearson's correlation
coefficient is the covariance of the two variables divided by the product of
their standard deviations - the first question was how to do it for Boolean
variables. We assumed that the only value that matters can be a failure of a
test. During the lecture, we will present how mathematical analysis can reveal
potential flaws in test criteria by targeting cases that have a large chance to
fail simultaneously.