# NIST Covering Array Tables - About these pages

## An Introduction to Covering Arrays

### Motivation

A covering array is a mathematical object that can be used for software testing purposes (see the Hartman paper for another introduction). Software often has many possible configurations, each of which could be faulty. Software testing tries to make sure that no actual error occurs in any such configuration. One possible method to achieve this goal is to simulate all possible configurations and check that no error occurs. However, this is typically impractical because there are too many configurations to check.

There are many ways to approach software testing in a more practical way, but few of them capture the abstraction and generality of simply running a ``test and check'' procedure on various sample configurations. To make this method more practical, we can realize that testing all possible configurations of a software program can be redundant and it might be possible to achieve an adequate simulation of all of the program's behavior with a much smaller number of carefully chosen configurations.

### Definition

The notion of coverage is used to quantify how well a set of sample configurations ``covers'' the set of all configurations. Intuitively, full coverage means that all configurations are chosen and no coverage means that none are chosen. Intermediate levels of coverage are defined in terms of a parameter t. There are two additional parameters that represent the total number of configurations: k is the number of variables a configuration needs to specify and v is the number of possible values each of the k variables can take on. While in practice each variable can take on a different number of values, we only consider the homogeneous case (where all inputs take on the same number of values) on this site. Taking t equal to k will produce full coverage and taking t equal to zero will produce no coverage. To understand what intermediate values of t mean we turn to an example.

We will consider a simple program that takes only four binary variables. Thus, k=4 and v=2. To take t=4 would give full coverage, and thus the following set of configurations to test the program on:

 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1

If we look at this list of configurations, we notice that whichever two columns out of the four columns are chosen, all possible pairs of values come up. Specifically, the pairs 00, 01, 10 and 11 all appear in the rows when we look at the first and second columns only, the first and third columns only, etc. This is intuitive because the table fully covers the configurations. However, there are other configurations that might also share this property. Specifically, we can consider the table:

 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0

This table of five configurations does indeed have the property that for each pair of columns chosen, all possible configurations of the two inputs appear. This property is called 2-coverage, and corresponds to when t=2. By abstracting away from the idea of software testing we see this is just an array of numbers and because it satisfies the 2-coverage property we call it a covering array. We can naturally generalize this to arbitrary t by defining t-way coverage to hold on an array when for every choice of t columns from the k columns, all possible v^t t-tuples appear within the rows of the array.

### Optimality

Now that the definition of a covering array is put forth, we can seek optimal covering arrays. That is, for fixed t, v and k values, what is the size of the smallest covering array? As k determines the number of columns, this question is really asking what is the smallest number of rows that an array can have while satisfying the t-covering property. The above covering array with five rows is in fact optimal, as set forth by a paper by Kleitman and Spencer. Except in this special case for t=2 and v=2 it is generally unknown what are the sizes of optimal covering arrays. For example, the following covering array for t=3, v=2, and k=4 has 9 rows, but in fact 8 are sufficient (as listed in Charlie Colbourn's covering array tables).

 0 0 0 1 0 0 1 0 0 1 0 0 0 1 1 1 1 0 0 0 1 0 1 0 1 1 0 1 1 1 1 0 1 0 1 1

### Applications

A major application of covering arrays is the task of software testing (see the Hartman paper). Any software package will have two natural parameters, the number of inputs and the number of values each input can take. For simplicity, this site only deals with the homogeneous case of when all of the inputs take on the same number of values. This first parameter corresponds to k and the second corresponds to v, as explained above (and below). A third parameter t needs to be chosen and this dictates the ``level'' of coverage of the input space. Ideally it would be possible to take t=k, but that typically results in too many configurations to test. Empirical studies show that for most software packages t need not be more than 6 to find all errors. Once the parameters are fixed, a covering array can be generated. The rows of the array will correspond to tests that need to be performed on the software package and thus need to be converted into a form the program can use. For example, if a system of networked printing was being tested the inputs might be the operating system, the web-browser, the network type and the time of day. If we take two values for each input, we could modify the above covering array into the test suite below for this system.

OSBrowserNetworkTime
MacOperawiredmorning
MacFirefoxwirelessafternoon
LinuxOperawirelessafternoon
LinuxFirefoxwiredafternoon
LinuxFirefoxwirelessmorning

### References

1. D. J. Kleitman and J. Spencer: Families of k-independent sets. Discrete Math.6 (1973), pp. 255. 262.
2. Software and Hardware Testing Using Combinatorial Covering Suites, A. Hartman, a chapter in "Graph Theory, Combinatorics and Algorithms: Interdisciplinary Applications", published by Kluwer Academic Publishers.

## The Variables of Covering Arrays - t, v, k, and CA(t,k,v)

The question of ``What is the size (number of rows) of the smallest covering array'' is parameterized by the variables t, v, and k:
• t - t is sometimes called the ``strength'' of the array. It is a measure of how ``fully'' we cover the possible rows in the array. For t=0, no rows need to be present and for t=k all rows need to be present (as the number of columns is equal to k). This is because any covering array must satisfy the t-covering property: when any t of the k columns are chosen, all v^t of the possible t-tuples must appear among the rows.

For example, the following array is not a covering array for t=2, v=2, and k=4, because the second and forth columns do not have the 2-tuple (1,0) among the rows.

 0 0 0 0 0 1 1 1 1 1 0 1 1 0 1 1
• v - v is the number of possible values that each entry in the array can take on. While the actual symbols chosen are irrelevant, they typically the numbers between 0 and v-1. v corresponds to the number of settings an input in the software can take on. While typically this varies for each input, these pages only consider the homogeneous case.

• k - k is the number of columns in the array. Thus, the ``size'' of an array is usually given as its number of rows, as the number of columns is fixed. k corresponds to the number of inputs in the software package that is being tested.

• CA(t,k,v) - CA(t,k,v) is the so-called ``covering array number'' of the variables t, k, and v. It is the number of rows in the associated covering array. The smallest possible array gives the covering array number denoted CAN(t,k,v), and so these pages present upper-bounds for CAN numbers.