Comprehensive system testing of embedded software may require running many (that can be hundreds of) tests on expensive test systems. Even if the tests are automated, their execution takes valuable time. That means development teams may wait many hours or days for the test results. The earlier the test suite reports a bug, the earlier developers can react. Consequently, it is desirable to run those tests first, that have the highest likelihood of finding bugs. In addition to identifying and prioritizing such “hot candidate tests”, the ideal automated regression test suite will provide the information needed by developers to quickly assess the issue. For example, by automatically running similar tests or the same test on another product variant. Sounds too good to be true. In this blog post, I will describe how this became reality for my team testing a distributed real-time system.
Test impact analysis
Without any doubt, the most effective way of test case prioritization (TCP) in a system regression test suite is running those tests first, that execute recently changed lines of code: coverage-based test impact analysis. This, however, requires knowledge of the code coverage of system tests, which can be acquired by means of code instrumentation, for example. If this information is unknown or cannot be obtained at all, we can try to approximate the coverage-based test impact analysis. This is the way we went. Since statistics plays an important role in our approach, I will start describing how we collect metrics related to testing and debugging.
Bug lifecycle and test records
For each bug found, we create a separate ticket in our bug tracker. The title of the ticket contains the identifier of the test(s) that revealed the bug. In the bug tracker, each bug is assigned a certain severity. The test team closes the bug ticket when the test(s) mentioned in the ticket pass after the fix. The bug tracker is integrated with the software configuration management (CM) tool. When developers fix a bug, they mention the bug id in the commit message of the CM-tool. In this way, our tools link the bug ticket to the code change of the fix.
When our test environment has executed a test, it stores the following data into a SQL database: the software version under test, the test verdict, the test duration, and the current date. To keep our statistics “clean” we do not store any data in the database when we debug tests, i.e. when we execute a test on a feature branch in the CM-tool. Further, we do not want to falsify data in the database, when there is an issue with the test environment, e.g. a communication problem with a measurement device. For such cases, we introduced a third test verdict ERROR in addition to FAILED and PASSED.
When we start a test suite and there is an open bug ticket for a test, we can choose to either skip the execution of such tests or to execute these tests and add a link to the bug ticket in the test summary report.
Static test case prioritization
I implemented the following two ideas to approximate coverage-based test impact analysis:
- Raise the priority of tests that are new. Most likely new tests execute new features and thus new code.
- Raise the priority of tests that failed recently. Bugs that were revealed by these test runs could be fixed now. Thus, also these tests have a high probability to execute new code.
Inspired by an academic publication, I added something very simple to this static test prioritization:
Raise the priority of tests, that have a high bug-finding score.
I can confirm out of my personal experience that this simple measure makes sense: Out of 1500 automated tests in our test suite, we have 3 tests that found more than 10 bugs and several hundred that never found a single one. Giving such three super-heroes a higher priority than other tests, thus makes sense.
Painting black box tests white
There is more that we could do with respect to static prioritization. And this next idea implemented by my team indeed resembles coverage-based test impact analysis: exploiting coverage information contained in bug tickets. Remember that all our bug tickets have a link to the source code that fixed the bug and contain the ids of the tests that revealed the bug. Bug tickets thus link system tests to the code of the software under test. We dramatically raise the priority of tests, if they are linked to source code parts, that just have changed. The search for such tests is done automatically by accessing the bug tracker and the CM-tool via an application interface.
Finally, if two tests have the same priority, then we schedule the one first, that has a shorter execution time. For large test suites, this can make a difference.
Dynamic test case prioritization
The paper mentioned before shows that, during a test campaign, the most recent test verdict can be used to dynamically re-order pending tests to increase the efficiency of the test suite. For the efficiency increase, the researchers employ rule mining and artificial intelligence. I have adopted their idea of re-ordering tests. My approach, however, does not use any complicated stuff. I am happy to present that in the following blog post.
Reference and further readings
- Dipesh Pradhan, Shuai Wang, Shaukat Ali, Tao Yue, Marius Liaaen: “REMAP: Using Rule Mining and Multi-Objective Search for Dynamic Test Case Prioritization“. 2018 IEEE 11th International Conference on Software Testing, Verification and Validation.
- Improving test efficiency: 7 simple steps to cope with the “testing is taking too long” problem