Automatic Testing: Battleships Anyone?

In the last article, we made the case for augmenting testing with automatic algorithms to complement human effort. We explored what automatic methods could do, looking only at the source code and looked at the problems they could solve for us. This article picks up where we left off, and explains how we can analyse programs and try to find inputs to achieve code coverage automatically.

A Simple Programming Language

To start with, we'll use the simplistic programming language from the previous article that demonstrates how we can analyse a program to transform the activity of generating inputs into a search problem. Our language has the familiar constructs: branching (if statements), relational operators (equality, greater than, less than), and a return statement. The language has variables too, the values of which might be set elsewhere (but can't change).

			Please enable JavaScript to view the examples.
Figure 1: An example program

The example has a small amount of behaviour. We control three inputs, a, b, and c and one of three things will happen depending on that input. If the three inputs are the same, we will reach the first return statement. Otherwise, we will reach one of the two statements depending on the value of the inputs.

This example has a few things wrong with it. Firstly, the implementation is wrong: triangles are misclassified. There also aren't enough checks to make sure that the side lengths make a triangle. However, we can't see these faults by looking at the code in isolation. We only know about the faults because we know what a triangle is.

If the code can't tell us if it's broken, what does it tell us? We can take two different 'views' of code. The first perspective is 'static'; we run a tool that analyses the contents of the source file and looks for problems. Static analysis tools such as PVS-Studio and Coverity can uncover potential bugs using detection rules and data flow analyses. The other perspective is 'dynamic'. Instead of reading the source code, we watch what the program does, analysing it 'dynamically'. These approaches are complementary: they are both adept at finding particular flaws.

Dynamic analysis is well suited to problems where the source code can't be analysed, or where the results can't be easily modelled (more on models later). In this example, we'll use dynamic analysis to cover 100% of the decision points in the program. We can enumerate the paths through the program statically by tracing each branch. While it's straight forward to get the list of paths, generating inputs that follow those paths is more difficult.

			Please enable JavaScript to view the examples.
Figure 2: The path to the goal
To begin, suppose we want to reach the statement . In order to do that, we'll need to find a path through the program that allows us to reach that statement. We can compute the path by collecting the branches taken to reach the target, shown in the box on the left.

At this stage, we have a target, and we know the path we need to take to reach it. However, at each decision point in the path, there is a test we must satisfy, that our input values allow us to advance. This is where search and computing power come into the fold.

Finding Inputs

At each step along the path, there is a condition that we must solve. We can't simply try to solve each in turn, as values that satisfy one condition might conflict with later conditions. There are two key ways we can try to solve these conditions.


We can take the condition for the branch and analyse it symbolically. This works by taking each of the conditions from the branches, formulating it as a mathematical expression, and reasoning about it using, for example, an SMT solver like Z3. This is easy for simple, linear constraints, such as a == b, but becomes more difficult as the math gets harder and the constraints get larger. Pex (used in Visual Studio 2015's Smart Unit Tests) uses this approach and relies on a comprehensive set of "models" for various arithmetic, string, and IO operations that allow it to generate test cases for more complicated programs.


Instead of modelling the conditions and solving the constraint analytically, we can take a "scientific" approach. We try inputs, and use a modified program that replies with information on how close we were to the branch we targeted. Armed with that information, and information about previous attempts, we can try new values that might be more likely to reach the target. This approach is simpler to implement, and has the advantage that it only requires a minimum of understanding about the code to generate input data. Evosuite, and to an extent, fuzzers such as AFL, use this approach.

The program needs to tell us how close each input we try is to our target. In the example, we can easily tell if we reached a point on the target path by stepping through the program. However, if we miss the goal, we don't have the feedback we need to make the next guess. Unless we do something more intelligent, we can only make random guesses.

We get the feedback we need by modifying the program so that we measure the proximity of our inputs based on the difference between the input and the value that satisfies each constraint. When this is maximised, we've reached our target. We've gone from playing battleship ("hit" versus "miss") to playing Marco Polo (the feedback tells us where the target is compared to where we are).

In the case of our program, we can transform constraints into functions of the input variables, then search for inputs that maximise these functions. However, we must be careful that our transformation of values produces the right shape and gives us that proximity measure.

Figure 3: Detailed and undetailed feedback (Click here to switch feedback)
The example shows two encodings of the constraint x == 10. If you toggle the function, you can see the difference between the two. One produces the maximum value (100) only when x is 10, whereas the other takes the absolute value of 10 - x and subtracts it from 100, giving a slope towards the best value. Sloped shapes are ideal for search algorithms, as they give them a clue where to try next. In contrast, a spike at one value is hard to find without relying on luck.


Hitting Targets

Figure 4: Building up a picture with random guesses
In the search process we start with no information: we have to make a completely random guess. We can repeatedly make guesses across the input space to get an idea for where the values that reach our target are. The animation shows how this works. After a few random guesses, the "shape" becomes apparent. Instead of guessing randomly, we can use the shape to make a more informed choice. It's important that we don't have to understand the process that generates this shape; we can still reach the target without understanding the relationship between inputs and outputs.

There are multiple algorithms we can use to try to find these inputs. In our example, we only have one input which simplifies the problem a great deal. To find the maximum, there are a few strategies we can employ. The simplest is a random search, as used in the example above. If we wait long enough, we'll probably find the answer. Instead of relying on serendipitously finding the answer, we can be more methodical.

Figure 5: A simple search algorithm (switch feedback)
The example in Figure 5 shows how the optimum can be reached. We start from a random point, then move in the direction that produces the highest increase in fitness. If you toggle the feedback function, you can see how a more intelligent approach is able to find the maximum value by moving in whichever direction that yields an increase from the previous guess.

As the naive function is flat with the exception of the one good value, the search will give up (unless it randomly chooses it as its starting point). In contrast, using the absolute value of the distance gives a slope that is easy to traverse.

Covering Branches

Now we have a way to extract the constraints from each branch, plus an approach to represent constraints to run a searching algorithm. The final step is to combine them. In cases where we have multiple branches, we want to find values that satisfy both equations simultaneously. In a search-based context this is simple: we just aggregate the individual functions.

In practice, more advanced searching algorithms are used that reduce the number of inputs tried. There are a variety of approaches that can be used to minimise the time wasted exploring useless inputs, such as pattern search, which increases the stride it takes as long as fitness values increase. In principle, however, these algorithms are very similar to one another; at the very least they share the same goal.


This article showed how program analysis allows us to establish targets and generate feedback we can use to search for inputs. The next article in this series will expand on search algorithms, showing how the approach described here can be taken into practice by generating tests for our program with a more sophisticated search algorithm.

Read More