5-2-Test-vs-inspecti..

Report
Test vs. inspection
Part 2
Tor Stålhane
Testing and inspection
A short data analysis
Test and inspections – some terms
First we need to understand two important
terms – defect types and triggers.
After this we will look at inspection data and
test data from three activity types, organized
according to type of defect and trigger.
We need the defect categories to compare test
and inspections – where is what best?
Defect categories
This presentation uses eight defect categories:
• Wrong or missing assignment
• Wrong or missing data validation
• Error in algorithm – no design change is necessary
• Wrong timing or sequencing
• Interface problems
• Functional error – design change is needed
• Build, package or merge problem
• Documentation problem
Triggers – 1
It is difficult to focus on several problem areas at
the same time. It is practical to take one
problem area - area of concern – at a time.
We could for instance select “missing
exceptions” as a trigger. In this case we would
go through all the code and look for places
where an exception would be and check that
it has been inserted.
Triggers – 2
In general we will use the term “trigger” for:
• A goal to be achieved – e.g. understanding
something
• Something to be checked – e.g. conformance
to a standard
• Something to look for – e.g. side effects
Triggers – 3
We will use different triggers for test and
inspections. In addition – white box and black
box tests will use different triggers.
There is no definition of which terms that should
be used for triggers.
The triggers suggested in the following slides are
examples of triggers that have worked well for
others.
Inspection triggers
• Design conformance
• Understanding details
– Operation and semantics
– Side effects
– Concurrency
• Backward compatibility – earlier versions of this
system
• Lateral compatibility – other, similar systems
• Rare situations
• Document consistency and completeness
• Language dependencies
Test triggers – black box
•
•
•
•
Test coverage
Sequencing – two code chunks in sequence
Interaction – two code chunks in parallel
Data variation – variations over a simple test
case
• Side effects – unanticipated effects of a simple
test case
Test triggers – white box
• Simple path coverage
• Combinational path coverage – same path
covered several times but with different
inputs
• Side effect - unanticipated effects of a simple
path coverage
Testing and inspection – the V model
Inspection data
We will look at inspection data from three
development activities:
• High level design: architectural design
• Low level design: design of subsystems,
components – modules – and data models
• Implementation: realization, writing code
This is the left hand side of the V-model
Test data
We will look at test data from three
development activities:
• Unit testing: testing a small unit like a method
or a class
• Function verification testing: functional testing
of a component, a system or a subsystem
• System verification testing: testing the total
system, including hardware and users.
This is the right hand side of the V-model
What did we find
The next tables will, for each of the assigned
development activities, show the following
information:
• Development activity
• The three most efficient triggers
First for inspection and then for testing
Inspection – defect types
Activity
High level design
Low level design
Code inspection
Defect type
Documentation
Function
Interface
Algorithm
Function
Documentation
Algorithm
Documentation
Function
Percentage
45.10
24.71
14.12
20.72
21.17
20.27
21.62
17.42
15.92
Inspection – triggers
Activity
High level design
Low level design
Code inspection
Trigger
Understand details
Document consistency
Backward compatible
Side effects
Operation semantics
Backward compatible
Operation semantics
Document consistency
Design conformance
Percentage
34.51
20.78
19.61
29.73
28.38
12.16
55.86
12.01
11.41
Testing – triggers and defects
Activity
Trigger
Percentage
Test sequencing
41.90
Test coverage
33.20
Side effects
11.07
Activity
Defect type
Percentage
Implementation
testing
Interface
Assignments
Build / Package /
Merge
Implementation
testing
39.13
17.79
14.62
Some observations – 1
• Pareto’s rule will apply in most cases – both
for defect types and triggers
• Defects related to documentation and
functions taken together are the most
commonly found defect types in inspection
– HLD: 69.81%
– LLD: 41.44%
– Code: 33.34%
Some observations – 2
• The only defect type that is among the top
three both for testing and inspection is
“Interface”
– Inspection - HLD: 14.12%
– Testing: 39.13%
• The only trigger that is among the top three
both for testing and inspection is “Side effects”
– Inspection – LLD: 29.73
– Testing: 11.07
Summary
Testing and inspection are different activities. By
and large, they
• Need different triggers
• Use different mind sets
• Find different types of defects
Thus, we need both activities in order to get a
high quality product
Inspection as a social process
Inspection as a social process
Inspections is a people-intensive process. Thus,
we cannot consider only technical details – we
also need to consider how people
• Interact
• Cooperate
Data sources
We will base our discuss on data from two
experiments:
• UNSW – three experiments with 200 students.
Focus was on process gain versus process loss.
• NTNU – two experiments
– NTNU 1 with 20 students. Group size and the use
of checklists.
– NTNU 2 with 40 students. Detection probabilities
for different defect types.
The UNSW data
The programs inspected were
• 150 lines long with 19 seeded defects
• 350 lines long with seeded 38 defects
1. Each student inspected the code individually and
turned in an inspection report.
2. The students were randomly assigned to one out of
40 groups – three persons per group.
3. Each group inspected the code together and
turned in a group inspection report.
Gain and loss - 1
In order to discuss process gain and process loss,
we need two terms:
• Nominal group (NG) – a group of persons that
will later participate in a real group but are
currently working alone.
• Real group (RG) – a group of people in direct
communication, working together.
Gain and loss -2
The next diagram show the distribution of the
difference NG – RG. Note that the
• Process loss can be as large as 7 defects
• Process gain can be as large as 5 defects
Thus, there are large opportunities and large
dangers.
Gain and loss - 3
12
10
8
Exp 1
6
Exp 2
Exp 3
4
2
0
7
6
5
4
3
2
1
0
-1
-2
-3
-4
-5
-6
Gain and loss - 4
If we pool the data from all experiments, we find
that the probability for:
• Process loss is 53 %
• Process gain is 30 %
Thus, if we must choose, it is better to drop the
group part of the inspection process.
Reporting probability - 1
1,00
0,90
0,80
0,70
0,60
RG 1
0,50
RG 2
RG 3
0,40
0,30
0,20
0,10
0,00
NG = 0
NG = !
NG = 2
NG > 2
Reporting probability - 2
It is a 10% probability of reporting a defect even
if nobody found it during their preparations –
group effect.
It is a 80 % to 95% probability of reporting a
defect that is found by everybody in the
nominal group during preparations.
Reporting probability - 3
The table and diagram opens up for two
possible interpretations:
• We have a, possibly silent, voting process. The
majority decides what is reported from the
group and what is not.
• The defect reporting process is controlled by
group pressure. If nobody else have found it, it
is hard for a single person to get it included in
the final report.
A closer look - 1
The next diagram shows that when we have
• Process loss, we find few new defects during
the meeting but remove many
• Process gain, we find, many new defects
during the meeting but remove just a few
• Process stability, we find and remove roughly
the same amount during the meeting.
New, retained and removed defects
50
45
40
35
30
RG > NG
25
RG = NG
20
RG < NG
15
10
5
0
Ne w
Re ta ine d
Re m o ve d
A closer look - 2
It seems that groups can be split according to
the following characteristics
• Process gain
– All individual contributions are accepted.
– Find many new defects.
• Process loss
– Minority contributions are ignored
– Find few new defects.
A closer look - 3
A group with process looses is double negative.
It rejects minority opinions and thus most
defects found by just a few of the participants
during:
• Individual preparation.
• The group meeting.
The participants can be good at finding defects –
the problem is the group process.
The NTNU-1 data
We had 20 students in the experiment. The
program to inspect was130 lines long. We
seeded 13 defects in the program.
1. We used groups of two, three and five
students.
2. Half the groups used a tailored checklist.
3. Each group inspected the code and turned in
an inspection report.
Group size and check lists - 1
We studied two effects:
• The size of the inspection team. Small groups
(2 persons) versus large groups (5 persons)
• The use of checklists or not
In addition we considered the combined effect –
the factor interaction.
DoE-table
Group size
A
Use of
checklists B
AXB
Number of
defects
reported
-
-
+
7
-
+
-
9
+
-
-
13
+
+
+
11
Group size and check lists - 2
Simple arithmetic gives us the following results:
• Group size effect – small vs. large - is 4.
• Check list effect – use vs. no use – is 0.
• Interaction – large groups with check lists vs.
small group without – is -2.
Standard deviation is 1.7. Two standard
deviations – 5% confidence – rules out
everything but group size.
The NTNU-2 data
We had 40 students in the experiment. The
program to inspect was130 lines long. We
seeded 12 defects in the program.
1. We had 20 PhD students and 20 third year
software engineering students.
2. Each student inspected the code individually
and turned in an inspection report.
Defect types
The 12 seeded defects were of one of the
following types:
• Wrong code – e.g. wrong parameter
• Extra code - e.g. unused variable
• Missing code – e.g. no exception handling
There was four defects of each type.
How often is each defect found
0,90
0,80
0,70
0,60
0,50
low ex perienc e
high ex perienc e
0,40
0,30
0,20
0,10
0,00
D3
D4
D8
D 10
D2
D5
D9
D 12
D1
D6
D7
D 11
Who finds what – and why
First and foremost we need to clarify what we
mean by high and low experience.
• High experience – PhD students.
• Low experience - third and fourth year
students in software engineering.
High experience, in our case, turned out to
mean less recent hands-on development
experience.
Hands-on experience
The plot shows us that:
• People with recent hands-on experience are
better at finding missing code
• People with more engineering education are
better at finding extra – unnecessary – code.
• Experience does not matter when finding
wrong code statements.

similar documents