Writing robust test cases

I became a big fan of unit testing during my work on open source project OpenSwitch .net at Hewlett Packard Enterprise. The software development team operated in Agile Software Development mode and there was greater emphasis on delivering the software feature with a good chunk of unit test code. This unit test code, usually written in an automated scripting language like Python, could be later run to test the credibility of the feature against more code changes or be deployed in continuous integration (CI) systems as a part of the build, test and release cycle. This meant any small code could trigger hundreds or thousands of test cases to ensure that the code change would not degrade the quality of the current code base. It is a powerful concept to keep the feature velocity high and ensure greater code quality when managing huge code bases of the order of millions of lines of code.

But unit test code is code after all. At times the size of the unit test code is greater than the actual production code! As with the actual software feature code, the unit test code could have bugs hidden or they may be subjected to failure in certain conditions. Since a lot of unit test code and integration test code are a part of continuous integration machinery, the quality of unit tests is critical to fast feature velocity. I have seen times where due to bad test code, developers have had to wait for days and weeks to check in their code because some tests unrelated to their scope of changes would fail intermittently in the build, test and release pipeline. It is very frustrating to see critical pieces of code just lying around in GitHub review links waiting for someone to fix the failing tests.

So the problem is with writing better test cases that bear the test of changing code and stand the test of various scenarios of deployment. Some might argue, that the quality of writing test cases comes with experience just like the art of writing code polishes with years of experience. True but not entirely. I feel some basic rules of thumb are inherently required for software developers to write robust test cases. These recommendations may not necessarily help everyone, but they will certainly help one think about how not to write brittle or bad test cases.

I believe writing good unit test cases can be attributed to two general practices 1) Syntactic and Semantic good practices and 2) System and Test Execution good practices. So let's get started. (The following code snippets are written in C#)

Syntactic and Semantic Best Practices for Writing Tests

These practices for writing tests are essential from the point of view of following good programming practices while writing tests. Often, we as developers overlook the importance of the quality of test code which makes failures harder to debug when the tests fail in builds or pre-deployment. Let's review some of the best practices to follow to ensure we write less flaky tests.

Use asserts wisely and how they are meant to be used.

Often developers ignore adding a meaningful statement describing why the test case failed. Consider the test case below:-

int expectedValue = 5;
int actualValue = 4;
Assert.AreEqual(expectedValue, actualValue);

Even though the above test case will fail, there is no documentation as to why the developer wanted the test case to fail. Most unit testing frameworks or programming languages allow a version of Assert(), which allows an error message to be displayed when the test case fails. So the above code might be modified as follows:-

int expectedValue = 5;
int actualValue = 4;
Assert.AreEqual(expectedValue, actualValue, "Let's document why this assert will fail");

Use Regular expressions instead of string comparisons

As much as possible, use unit testing language's regular expressions instead of string comparisons. Generic regular expressions are less likely to result in a test case failure than strict string-based comparisons. Consider the following two unit test codes to find a text string in the someBuffer string.

// A.
Assert.IsTrue(
    someBuffer.Contains("via  11.11.11.1,  [1/0],  static"),
    "This is a strict string comparison.. Should fail often"
);

// B.
Assert.IsTrue(
    Regex.IsMatch(
        someBuffer, @"(.*)11.11.11.1(.*)[1/0](.*)static"),
        "This is a regular expression based comparison.. "
        "Should pass the test of time"
);

Verification in step A is more likely to fail than verification in B if the output buffer someBuffer changes concerning the number of white spaces which is a common case scenario in a rapidly fluid development cycle. Verification performed in step B involves using regular expressions and hence it is more robust against the addition or deletion of white spaces in the test text.

The test validations should be done on the required output only.

Validating an output that is more than what is required by the test case could lead to bugs in the test case which eventually defeats the purpose of writing unit test cases. Consider a hypothetical test case in which the test case should fail if, in a given test sentence, there is no word-like process. A usual unit test case statement is written below:-

string sentence = "Some words need to processed "
                  "again to keep the process going";
Assert.IsTrue(
    sentence.Contains("process"),
    "The sentence doesn't contain the word 'process'"
);

The assertion statement looks well-formed. However, if the word process in the above test sentence has an actual spelling mistake, then the test case wouldn't fail. Consider the code snippet below:-

string sentence = "Some words need to processed again to keep the precess going";
Assert.IsTrue(sentence.Contains("process"), "The sentence doesn't contain the word 'process'");

Why doesn't the test case fail now that the test sentence doesn't contain the word process? It is because of the presence of the word processed in the test sentence. Hence, the test case wasn't verifying against the correct word or the data being verified against was superfluous. A way to fix the above test case is to tokenize the test sentence to break the sentence into individual words and verify against each word.

string sentence = "Some words need to processed again "
                  "to keep the process going";
string[] words = sentence.Split(' ');
bool if_found = false;

foreach (string word in words)
{
    if (word.Length == "process".Length && word.Contains("process"))
    {
        if_found = true;
        break;
    }
}

Even though the above example is very basic and it might feel like I have cooked up some random error scenario, I have seen some test cases to test production code where developers tested against larger text data which resulted in test cases missing the error scenarios in most of the cases. As a result, bad code gets into production and when shit hits the ceiling, your manager wonders how and why the bad code deployment happened.

System best practices for test execution

The system's best practices ensure that tests are executed reliably so that they are not affected by the variations in the build system. These practices also ensure that test execution is optimized to run in the minimum possible time and the tests are not flaky because of limitations on memory buffers and other resources.

Tests should be independently executable

The test cases in a single test class should be executable independently of each other. This means that the result of one test case shouldn't impact the execution of another test case. A lot of unit test frameworks like Junit, pytest and Visual Studio allow either parallelizing of execution of unit test cases and at times they could execute test cases in some non-deterministic random order. In either case, the test cases dependent on other test cases could potentially fail and that too in unpredictable and hard-to-reproduce scenarios.

Have test setup and test cleanup wherever needed

The common costly steps required for execution of test cases should be made a part of the test setup and the common steps required to be done after the test cases should be made a part of test clean-up. This has a two-fold benefit. One this reduces the size of your test code. Secondly, common or expensive steps in a particular test case, such as establishing a TCP session with a server or parsing of text data, are done only once during the execution of all test cases in a given test code file.

Create reusable libraries for parsing textual data

If you are parsing a lot of textual data and basing your test cases over the results of the parsed data, then it might be a good idea to create reusable libraries that could perform the parsing and extraction of data and return structured data that could be consumed by your test cases as formalized structures. Creating parsing libraries and populating data structures with the results also allows you to leverage a given language's comparator's support. Using a language comparator to perform the unit testing is more efficient than calling individual assert statements on various data fields.

Do not perform test validation on large string buffers

Never read large files or console output into string buffers to perform validations over the string buffers. The available size of the string buffer is dependent on the available memory in the machine and the language defaults (usually language defaults for maximum string sizes are very large) which might cause your string to have truncated data for test case validation. This makes your test case brittle and the test case may fail randomly on different systems. You should attempt to read data in small chunks from files or a console and perform the necessary testing over these smaller units of data.

Gaurav Gupta's Blog