Measuring MC/DC Coverage of Pair

Report
Software Testing Research Group (STRG)
An Evaluation of MC/DC Coverage for Pair-wise Test Cases
By David Anderson
[email protected]
Background
• Software is becoming larger and more complicated, which naturally means the cost
and time associated with testing is increasing. According to a National Institute of
Standards and Technology report, software bugs cost the U.S. economy an estimated
$59.5 billion annually.
• The same report indicated that one third, or $22.2 billion of that amount could be
saved by improving testing infrastructure.
• New research needs to be conducted to find more cost effective ways to test
software.
The Proposal
• This project proposes the integration of pair-wise testing and MC/DC to create a new
framework to help software developers test their products in a more cost effective
way.
• This part of the project is concerned primarily with measuring MC/DC coverage using
test cases generated by pair-wise testing.
• Future parts include research into how to improve MC/DC coverage of pair-wise test
suites and developing tools that integrate these two testing techniques into one
framework.
Definitions
• Pair-wise testing: a testing technique that analyzes interactions between variables
using a small number of tests to cover all possible pairs between parameters.
• Modified Condition Decision Coverage (MC/DC): A code coverage criterion that
requires every point of entry and exit in a program to be executed at least once, every
condition in a decision takes on all possible outcomes at least once, and each
condition is shown to affect that decision’s outcome independently.
Pair-wise example
Consider the Boolean equation: d = (A ∧ B) ∨ C
The following are acceptable test cases for full pair-wise coverage.
A
B
C
t1
1
1
0
t2
1
0
1
t3
0
1
1
t4
0
0
0
Pair-wise facts
• Pair-wise is a powerful black-box testing technique. Extensive research has been
conducted on this technique with outstanding results.
• The number of test cases compared to exhaustive testing is significantly less. The
bigger the system being tested, the better this reduction is.
MC/DC example
Consider the Boolean equation: d = (A ∧ B) ∨ C
The following are acceptable test cases for full MC/DC coverage
A
B
C
d
t1
1
1
0
1
t2
1
0
1
1
t3
1
0
0
0
t4
0
1
0
0
MC/DC facts
• MC/DC is a white-box testing technique that ensures adequate coverage of decisions
in software.
• MC/DC is used in standards DO-178B and DO-178C to ensure adequate testing of
safety-critical software. In particular, the FAA has adopted this technique for the
testing of airborne software.
• Given an expression of N values, on average N+1 test cases are needed to satisfy
MC/DC coverage. For comparison, exhaustive testing requires 2N test cases.
Why combine MC/DC and pair-wise?
Reason 1: Effectiveness of testing Boolean Expressions
Pair-wise - weak
MC/DC - strong
Pair-wise testing is not very effective at testing Boolean
expressions. This has been demonstrated in the paper
“Effectiveness of Pair-wise Testing for Software with
Boolean Inputs” by W. Balance, S. Vilkomir, and W.
Jenkins. In this study, pair-wise testing was only slightly
more effective than random testing.
MC/DC is designed for testing complex Boolean
expressions. Many studies have been conducted on the
effectiveness of MC/DC with very positive results. In
avionics software it is not uncommon to have Boolean
expressions with 6+ variables. MC/DC was created
specifically to adequately test this kind of complex logic.
Why combine MC/DC and pair-wise?
Reason 2: Cost of implementation
Pair-wise - relatively inexpensive
MC/DC - expensive
Pair-wise and combinatorial testing in general is
relatively cheap to implement. This comes from the
black box nature of the technique. A relatively small set
of input data is needed for full pair-wise coverage.
Since MC/DC is a white box technique, testing of the
underlying code is necessary. In particular, each Boolean
expression must have in individual set of test data to
achieve full MC/DC coverage. This makes implementing
MC/DC very time consuming and expensive.
Tools used
• Automated Combinatorial Testing for Software (ACTS): A tool developed by NIST that
is used to generate combinatorial(in this case pair-wise) test cases for specified input
variables.
• CodeCover: An Eclipse plugin developed at the University of Stuttgart that is used to
measure various code coverage metrics including MC/DC. This was the main tool used
for measuring coverage.
• CTC++: A commercial tool by Verifysoft for measuring coverage of C/C++ programs.
This tool was used to verify the correctness of the data from CodeCover.
Demonstration
For this demonstration, consider the Boolean expression:
(A ∧ B) ∨ (C ∨ D)
Part 1: Generating Pair-wise test cases with ACTS
Part 2: Measuring MC/DC Coverage with CodeCover
Simple program
Note
While the previous example obtained 87.5% MC/DC Coverage, the results are not
always this good…
The Experiment
Two categories of expressions
• Boolean expressions were categorized as either “Simple” or “Complex”.
• Simple expressions were defined as expressions without repetition in variables while
complex expressions contained repetition.
• For example:
Simple
Complex
(A ∧ (B ∨ C)
(A ∧ B) ∨ (¬A ∧ C) ∨ (¬B ∧ ¬C)
• The reasoning behind this was that complex expressions add more points of
measurement to the expressions. In complex expressions, each instance of the
variable in the expression has to be covered while in simple expressions each variable
only has one point to be covered.
Comparison with random test cases
• For each size of expression, one set of pair-wise test cases and three sets of random
test cases were generated.
• Random test cases were generated simply by using a random number generator and
converting that number into binary.
• Each set of random test cases had the same number of cases as the pair-wise set for
that expression size.
• The goal was to see if pair-wise test cases obtain better levels of MC/DC than
randomly generated test cases.
Experiment design
Simple Expressions
Number of
Variables
Number
of
Expressions
pair-wise
sets
pair-wise
test cases
random sets
random test
cases
3
6
1
4
3
12
4
12
1
6
3
18
5
10
1
6
3
18
6
10
1
7
3
21
7
10
1
7
3
21
8
10
1
8
3
24
total
58
6
38
18
114
Experiment design
Complex Expressions
Number of
Variables
Number
of
Expressions
pair-wise
sets
pair-wise
test cases
random sets
random test
cases
3
6
1
4
3
12
4
10
1
6
3
18
5
10
1
6
3
18
6
10
1
7
3
21
7
10
1
7
3
21
8
10
1
8
3
24
total
56
6
38
18
114
Results
Comparison based on size
Complex Expressions
90
90
80
80
70
70
60
60
MC/DC Coverage (%)
MC/DC Coverage (%)
Simple Expressions
50
40
30
50
40
30
20
20
10
10
0
0
3-var
4-var
5-var
Pair-wise
6-var
Random
7-var
8-var
3-var
4-var
5-var
Pair-wise
6-var
Random
7-var
8-var
Comparison based on complexity
90
80
70
MC/DC Coverage (%)
60
50
40
30
20
10
0
3-var
4-var
5-var
6-var
Simple
Complex
7-var
8-var
Summary of Results
Simple
Pair-wise
Random
Complex
Pair-wise
Both
Random
Pair-wise
Random
3-var
77.8
75.9
77.7
74.2
77.7
75
4-var
76.0
73.3
80.3
71.8
78
72.6
5-var
70.0
64.7
73.4
66.4
71.7
65.4
6-var
64.2
60.3
62.7
66.6
63.4
63.4
7-var
59.3
50.2
60.4
57.8
59.9
54.0
8-var
57.5
53.9
49.4
52.9
53.5
53.4
Average
67.1
62.5
66.5
64.3
66.8
63.4
Analysis
Analysis
• The data found in this experiment suggests that pair-wise test cases obtain only
slightly better coverage than randomly generated test cases.
• The data between simple and complex expressions did not seem to be significantly
different.
• With larger expressions, coverage appeared to slowly decrease.
• Coverage appeared to be highly dependent on the structure of individual
expressions, with high variance within sets of data.
Stability of Results
• It should be noted that the range
of the results was very high.
100
MC/DC Coverage (%)
• Note the chart to the right. This is
a sample from one set of 4-variable
expressions. As you can see, there
is a wide range of coverage levels
for both pair-wise and random
tests.
120
80
60
40
20
0
1
2
3
4
5
6
Pair-wise
7
8
Random
9
10
11
12
What does this mean?
• Because of this high range and variance of MC/DC coverage level, this data only
presents a good average of coverage when many expressions of different sizes,
complexities, and structures are measured together.
• This average would not be suitable as a predictor for coverage of individual
expressions, or for software from the industry.
Current Work
Analyzing coverage for one large set of test data
• In the previous experiment, each size of expression had different test data.
• In this experiment, one set of test data for 10 Boolean variables is used. Expressions
of different sizes and containing different subsets of these 10 variables are tested and
coverage is measured.
• This approach better matches the structure of industry software by using one set of
test data for many expressions of different sizes.
Industry Software
• Since the long-term goal of this project is to create a framework for developers to
test their code in a more effective way, applying this approach to software from the
industry is important.
• Repositories exist such as the Software-Artifact Infrastructure Repository that contain
many examples of software intended for experiments such as this one.
• The results for this could be very different that the results from measuring coverage
of individual expressions.
Methods for Improving Coverage
• Now that we have some data for coverage, a next step is to look for methods to
improve this coverage.
• Methods could be increasing interaction strength (3-wise, 4-wise, etc.) or adding
additional test cases to the pair-wise sets based on some criteria.
Any Questions?

similar documents