Mutation-Based Testing

David Pryor
Mutation-Based Testing
 Same basic goal as Code Coverage
 Evaluate the tests
 Determine “how much” code exercised
 Mutation testing goes beyond checking which lines of code
were executed
 Goal: Distinguish statements that are simply executed from
those that are fully tested
 History
 Theory published in 1971 by a student
Computationally Infeasible
 Technological advances: 90’s – present
 Still not mainstream, but gaining popularity
Basic Procedure
 Requirements
 Complete working program
 Tests written and passing
 Make a small change(mutation)
to the source code of the program
 Run the tests
 If a test fails:
 The test “killed” this mutant
 If all tests still pass:
 Redundant/Unnecessary code
 Tests are incomplete
Example test functions
 This test achieves 100% code coverage
 testAbs1 will never fail
 Even if abs(-3) == -3
 Mutation testing can detect this
 testAbs1 will initially pass
 It will still pass on mutated code
 It failed to “kill” the mutant
 The test is inadequate
Example test functions
 testAbs2 passes initially
 On the mutated function, it fails
 abs(-5) != 5
 Test is supposed to fail
 It shows the test is robust enough to
catch errors such as this mutation
 It killed the mutant
Mutation Operators
 Types of mutations to be applied
 Types of “changes” to make to the code
 Defined by the testing framework
 Chosen by the user
 Goals
 Introduce errors
 Simulate common coding bugs
 Ensure the testing of all possible circumstances
 Traditional Mutation Operators
 Simple
 Common
 Included in most mutation frameworks
Dropped Statement
 Removes a single statement from
the program
 Unnecessary code
 Needs to be selective
 Many possible mutants
Arithmetic Operator Replacement
 Swaps arithmetic operators
 +, -, *, /, %
 Some frameworks have set
 + always becomes *
 Others allow any/random
Boolean Relation Replacement
 Swaps boolean relations
 ==, !=, <, <=, >, >=
 Tightly constrained form
 Only mutates to/from similar relations
 < to <=
 > to >=
 == to !=
Boolean Expression Replacement
 Replaces an entire expression with
true or false
 Unnecessary Code
 Code paths that aren’t sufficiently
Variable Replacement
 Replaces a variable reference with
a different variable reference
 The new variable must be
accessible and defined in the
same scope
 Not trying to create compiler
 False positives
 Unnecessary/Duplicate variables
Non-Traditional Operators
 Lots of Operators out there
 Bit operation / Binary operators
 Shift/Rotate replacement
 Increments / Decrements
 Invert Negatives
 Should be able to define your own for customized
 Ideally:
 Minimize false positives
 Reasonable number of created mutants
Replace inline constants
 Replaces an inline
 Numeric or string
 Non-Deterministic
 Tests for responses
more than behavior
 Requires many tests
that may not be
 Used in Heckle(Ruby)
Object-Oriented Operators
 Encapsulation
 Changing access modifiers
 Inheritance
 Hiding variables
 Method overriding
 Parent / super actions
 Polymorphism
 Lots of operators here
 Basic Idea: change something between the parent and
the child in the usage of an object
Concurrency Operators
 Modify sleep/wait calls
 Mutual exclusion and semaphores
 Change boundaries of the critical section
 Change synchronization details
 Switch concurrent objects
 Others
 Goal is the same
 Evaluate the adequacy of the tests
Computation Time
 The biggest problem/roadblock with Mutation testing
 Theory has existed for 40+ years
 Computationally infeasible for use in industry for a long
 Frameworks can automate almost everything, but:
 Every mutant has to be run against the entire test suite
 Many mutants and a large test suite cause immense
testing times
Computation Time Estimation
 T = Time to run test suite against the code base
 M = # of Mutation Operators in use (3-20+)
 N = # of Mutants per Operator (depends on code base size)
 Mutation testing causes time to increase from T to T*M*N
 Minimum time increase of a factor of 30
 For small code base and few operators
 Total time increases very quickly
 Only time needed to run tests, not compilation
 If T = 1 minute, T*M*N can become hours or days
 If T = 1 hour, T*M*N can become weeks or longer
Addressing Computation Time
 Need to spend less time on mutation testing
 Variety of methods that fall into three categories
 Do Less
 Test fewer mutants and mutation operators
 Need to be careful
Fewer may result in poor tests slipping through
 Do Faster
 Increase the speed of execution
 Do Smarter
 Eliminate mutants and mutation operators that do not
provide meaningful results
Source code or Byte code?
 Can perform mutations on the source code itself, but:
 Large code bases result in lots of slow disk reads
 Have to compile EVERY mutant
 Instead, compile the original source once
 Mutate the compiled byte code
 Much faster
 Can be difficult to back-trace the byte code to the
source code to show the mutants that were created
Weak vs. Strong Testing
 Two conditions for killing a mutant
 Testing data should cause a different
program state for the mutant than for
the original
For example, a test results in:
 valid = false
 done = true
 The difference in state should result
in a difference in output and be
checked by the test
In this case: the test should check ‘result’
 Weak Testing: Only satisfy the first condition
 Strong Testing: Both conditions
Weak vs. Strong Testing
 Weak assures that the tests cause a difference
 Not assured that they check the difference
 Not as thorough
 Strong is ideal
 Computationally expensive
 Must always run to the end
 Weak can stop as soon as it detects a difference in state
Incremental Analysis
 Currently experimental
 Most useful for long-term projects with large code
bases, which use mutation testing over and over
 Basic Idea: save state and results of tests and code
 Only re-run those tests and mutants for which relevant
code has changed
 Decide which to skip based on changes made
Not perfected yet – can be “tricked” by odd behavior
Selective Mutation
 Goal: Eliminate some mutants or operators that are not necessary
 Mutants
 Remove duplicates caused by multiple operators
 Remove “likely” duplicates – those that will probably cause
duplicate results
Some amount of error here
 Mutation Operators
 Some pairs of operators might produce many of the same
 An operator might produce a subset of mutants from a
different operator
 Detection can be difficult
Coverage Based Test Selection
 Typical mutant only affects a single statement / line
 Typical test suite has many tests that do not execute
this line
 No need to run these tests on the mutant
 Only run tests that exercise the mutated code
 Optimize the running order of tests
Other Problems and Design Considerations
Equivalent Mutants
 This mutant is functionally equivalent
to the original
 No test that calls this code could ever
distinguish the two
 “Some mutants can’t be killed”
 Can sometimes be detected
automatically and filtered out
 Not all can be detected
 Requires human effort to determine if
 Not always an easy task
Other Problems and Design Considerations
Mutant Infinite Loops
 Some mutations can cause infinite
loops in the mutants
 Statement deletion removed the only
way out of this loops
 Solution: Time the un-mutated code
 If the mutant takes significantly longer
than this time
Probably an infinite loop
Timeout after the un-mutated time, plus
some padding
Other Problems and Design Considerations
Complex Bugs
 Mutation only makes small changes
 What if the test cases miss a large, complex error?
 Mutation doesn’t create complex mutants
 Coupling Effect
 Hypothesis – Tests that detect simple errors are sensitive
enough to detect more complex errors
 Supported by empirical data
 Testing for “simple” errors helps to find the more
complex ones
Fuzz Testing / Fuzzing
 Completely unrelated to Mutation testing
 Often confused with Mutation
 Involves generating random data to use as input to a
 Test security/vulnerability
 See if the program crashes
 Fuzzing – modifies input
 Checks program behavior
 Mutation – modifies source code
 Checks test case results
 Deterministic
Tools and Environments
 Java
 MuJava
 Bacterio
 Javalanche
 Jumble
 Jester
 C/C++
 Insure++
 Fortran
 Mothra
 Mutagenesis
 Ruby
 Heckle
 Mutant
 C#
 Nester
Using Mutation Testing in Industry
 Use Mutation from the beginning of a project
 Don’t use it with “dangerous” methods
 Look for high quality tools/environments
 Speed optimizations
 Reporting/Coverage information
 Configurable Operators
Benefits of Mutation testing
 Evaluation of tests / test suite
 More than code coverage: are the tests adequate?
 Mutation Score: % of non-equivalent mutants killed
 Evaluation of code
 Find unreachable or redundant code
 Find bugs that were hidden through inadequate tests
 Future: Automatic Test Generation
 Create dummy tests, use Mutation to revise
 Repeat until all or most non-equivalent mutants killed
 Still experimental, but promising
 Alexander, R. T., & Bieman, J. M. (2002). Mutation of java objects. IEEE Int. Symp.
Software Reliability Engineering, Retrieved from
Offutt, A. J. (n.d.). A practical system for mutation testing: Help for the common
programmer. Retrieved from
Offutt, A. J., & Untch, R. H. (n.d.). Mutation 2000: Uniting the orthogonal.
Retrieved from
Ma, Y. S., Kwon, Y. R., & Offutt, A. J. (n.d.). Mujava: An automated class mutation
system. Retrieved from
Bradbury, J. S., Cordy, J. R., & Dingel, J. (n.d.). Mutation operators for concurrent
java(j2se 5.0). Retrieved from
 Pit mutation testing. (n.d.). Retrieved from

similar documents