Group R

Report
INFO 631
Prof. Glenn Booker
Week 3 – Complexity Metrics
and Models
INFO631 Week 3
1
www.ischool.drexel.edu
Origin
• Complexity metrics were developed by
computer scientists and software
engineers
• Strongly based on empirical (real world)
measurement, with little theory
• Primarily broken into internal and
external measures
INFO631 Week 3
2
www.ischool.drexel.edu
Internal versus External
• Internal measures describe the complexity
within a module (number of decisions,
loops, calculations, etc.)
• External measures describe relationships
among modules (program or function calls,
external file activities, input/output, etc.)
INFO631 Week 3
3
www.ischool.drexel.edu
Internal Measures
INFO631 Week 3
4
www.ischool.drexel.edu
Internal Product Attributes
• Size measures
– Input to prediction models
– Normalizing factor for cost, productivity, etc.
– Progress during development
• Typically use lines of code (LOC) or
function point counts;
– LOC is a better measure for predicting cost
and schedule
INFO631 Week 3
5
www.ischool.drexel.edu
Lines of Code
• Simple complexity metric, often based on
number of executable statements or instruction
statements
– Highest defect rates often occurs in small
modules
– Larger modules have a smaller defect rate
(if they exist at all) - until too cumbersome
– Optimum module size ~ 250 lines
INFO631 Week 3
6
www.ischool.drexel.edu
Function Points
• Function points help avoid biases due to
the programming language(s) used
• Provide a more “fair” basis for comparing
different environments
• Focuses on how much work the
program accomplishes, not how concisely
it is expressed
INFO631 Week 3
7
www.ischool.drexel.edu
Halstead Metrics
• Also known as Software Science, 1977
• Examine program as compilable “tokens”
• Tokens are either operators (+, -) or operands
(variables)
• Derive metrics such as Vocabulary, Length, Volume,
Difficulty, etc.
• Not widely used
INFO631 Week 3
8
www.ischool.drexel.edu
Data Structure (Halstead)
• Halstead’s 2 - number of distinct
operands in a module
– Operands include: number of variables,
number unique constants, and number of
labels
• Operand usage (OU)
– OU = 2/N2 where N2 is the total number of
operand references
INFO631 Week 3
9
www.ischool.drexel.edu
Software Complexity
• Is a characteristic that influences the
resources needed to build and maintain it
• Many different characteristics of software
relate to complexity
• These complexity characteristics revolve
around the structure of the software
INFO631 Week 3
10
www.ischool.drexel.edu
Types of Structural Measures
• Control flow
– Addresses sequence in which instructions are
executed
– Iteration and looping
• Data flow
– Follows trail of data as it is created and
handled
– Depicts behavior of data as it interacts with
the program
INFO631 Week 3
11
www.ischool.drexel.edu
Types of Structural Measures
• Data structure
– Concerned with organization of data itself
– Provides information about difficulties in
handling data and in defining test cases
INFO631 Week 3
12
www.ischool.drexel.edu
Control Flow
• Modeled by directed graphs (control
flow graphs)
– Each node corresponds to a single program
statement
– Arcs (directed edges) indicate flow of control
from one statement to another
INFO631 Week 3
13
www.ischool.drexel.edu
Control Flow
• Control flow graphs are useful for:
– Analysis (estimating number of defects)
– Expressing complexity by a single value
– Assessing testability and test coverage
INFO631 Week 3
14
www.ischool.drexel.edu
Basic Control Constructs
If A then X
A
t
f
X
While A do X
If A then X else Y
A
t
f
X
Y
Case A of
a1 : X 1
.
.
an : X n
Repeat X until A
A
a1
X
A
an
a2
f
f
t
X
Note: t=true
f=false
t
X1
X2
...
Xn
A
INFO631 Week 3
15
www.ischool.drexel.edu
Cyclomatic Complexity
• McCabe, 1976
• Based on a program’s control flow chart
• Related to number of separate graphable
areas, or number of linearly independent
paths in the program
• Complexity MC = edges - nodes +
2*(# of unconnected paths)
INFO631 Week 3
16
www.ischool.drexel.edu
Cyclomatic Complexity
• Complexity under 10 generally desired
• Can also find M as number of binary
decisions (yes/no) minus one
– Multiple choice decisions with ‘n’ choices
count as (n-1) binary decisions
• Ignores differences among specific types
of control structures
INFO631 Week 3
17
www.ischool.drexel.edu
Cyclomatic Complexity
• Uses of complexity metric:
– Identify complex modules needing detailed
inspection or redesign
– Identify simple modules needing minimal
inspection and/or testing
– Estimate programming, testing and
maintenance effort
– Identify potentially troublesome code
INFO631 Week 3
18
www.ischool.drexel.edu
Control Flow Representation
of Programs
• Software programs can be represented by
linear directed segments combined with
the basic control flow constructs
• Control flow constructs may be nested,
e.g. an IF statement can be inside of a
WHILE loop
INFO631 Week 3
19
www.ischool.drexel.edu
Control Flow Representation
of Programs
• Example:
1
2
McCabe cyclomatic complexity (MC) - counts
the number of linearly independent paths
through a program
4
10
MC = # of edges - # of nodes +2
3
5
13
6
7
8
14
9
12
11
Linearly independent paths for example
<2, 11>
<2, 10, 12, 14>
<2, 10, 12, 13, 12, 14>
<1, 3, 5, 6, 9>
<1, 4, 6,9>
<1, 4, 6, 7, 8, 9>
INFO631 Week 3
20
www.ischool.drexel.edu
Control Flow--Linearly
Independent Paths
a
1
2
d
b
6
5
3
4
7
e
8
c
f
9
10
g
MC = edges - nodes + 2
= 10 - 7 + 2 = 5
Set of linearly independent paths:
b1: abcg
b2: abcbcg
b3: abefg
b4: adefg
b5: adfg
Any arbitrary path is equal to a linear
combination of the linearly
independent paths listed above
For example, path abcbefg is equal to:
b2 + b3 - b1
INFO631 Week 3
21
www.ischool.drexel.edu
Knots - Control Flow Crossovers
• Knot measure -- total number of points
at which control flow lines cross
10
20
30
40
50
IF (TIME) 30,30,10
CALL TEMP1
IF (X1) 20,20,40
Y1=Y+1
Y2=0
CALL TEMP2
GO TO 50
Z1=1
CALL TEMP3
Z2=Z2+1
CALL TEMP4
INFO631 Week 3
How many
are here?
22
www.ischool.drexel.edu
Syntactic Constructs
• Examine effect of using specific control
structures on defect rate
• Is, by definition, language-specific
• Can result in statistically significant
relationships
– e.g. Lo used to show that DO WHILE should
be avoided in COBOL
INFO631 Week 3
23
www.ischool.drexel.edu
External Measures
INFO631 Week 3
24
www.ischool.drexel.edu
Computational Complexity
• Examines algorithmic efficiency and use of
machine resources (memory, I/O, storage)
• Studies quantitative aspects of solutions to
computational problems
• Examples may include sorting efficiency
for a database, managing I/O constraints
across a large scale network, etc.
INFO631 Week 3
25
www.ischool.drexel.edu
Psychological Complexity
• Concerned with characteristics of software
that affect human performance
- Injection of defects (when and why does a
programmer make errors?)
- Ease of building the software (effort required)
- Ease of maintenance (effort required)
INFO631 Week 3
26
www.ischool.drexel.edu
Data Structure (Database)
• Database size per program size
(DBSPPS)
– DBSPPS = DBS/PS
• Where DBS is database size in bytes or characters
• PS is program size in source instructions
– Used in COCOMO model as a cost driver
• Ordinal scale measure derived from DBSPPS
INFO631 Week 3
27
www.ischool.drexel.edu
Fan-in and Fan-out
• Focus is the interaction among code
modules
– Fan-in = # of modules which call a given
module
– Fan-out = # of modules which are called by a
given module
• Or, more formally...
INFO631 Week 3
28
www.ischool.drexel.edu
Fan-in and Fan-out
• Fan-in of a module is the number of local flows
terminating at the module, plus the number of data
structures from which info is retrieved by the module
• Fan-out of a module is the number of local flows
that emanate from the module, plus the number of
data structures (tables, arrays) that are updated by
the module
INFO631 Week 3
29
www.ischool.drexel.edu
Fan-in and Fan-out
• Do fan-in and fan-out affect software
quality?
– Large fan-in modules may be interpolation or
look-up routines - no defect correlation
– Large fan-out often relates to high defect rate
- has a high defect correlation
• Is large fan-in and fan-out bad?
INFO631 Week 3
30
www.ischool.drexel.edu
Fan-in and Fan-out
• Information flow complexity
– Henry and Kafura: Size*(fan-in * fan-out)2
– Shepperd: (fan-in * fan-out)2
• Henry and Kafura measure helps predict
the number of software maintenance
problems
Henry, S. and D. Kafura, IEEE Transactions on Software Engineering, 1981. SE-7(5): p. 510-518
Shepperd, M. 1990. Software Engineering Journal 5, 1 (January), pp. 3-10.
INFO631 Week 3
31
www.ischool.drexel.edu
Structure Metrics
• Shepperd measure correlates with
software development time
• Information flow metric (Henry & Selig)
HC = C * (fan-in * fan-out)^2
– where C is the cyclometric complexity
INFO631 Week 3
32
www.ischool.drexel.edu
Structure Metrics
• System complexity (Card & Glass)
– Based on structural complexity (average fanout squared) and data complexity (based on
number of I/O variables and fan-out)
– Quantified effect of complexity on error rate
INFO631 Week 3
33
www.ischool.drexel.edu
Module Call Graph
• Module - a contiguous sequence of
program statements, bounded by
boundary elements, having an aggregate
identifier
– Or, a distinct, named group of LOC
• The module call graph shows which
modules call each other, and what key
information is passed among them
INFO631 Week 3
34
www.ischool.drexel.edu
Module Call Graph example
Main
scores
scores
eof
Read_Scores
Average
scores
average
average
Find_Ave
INFO631 Week 3
Print_Ave
35
www.ischool.drexel.edu
Module Coupling Measures
• Average number of calls per module
(ANCPM)
• Fraction of modules that make calls
(FMC)
Number of Interconnections
ANCPM =
Number of Modules
Number of Modules that call
FMC =
Number of Modules
INFO631 Week 3
36
www.ischool.drexel.edu
Information Flow Measures
• Types of information flows
– Local direct flow
• Module invokes a 2nd module & passes info to it
• Invoked module returns result to the caller
– Local indirect flow
• Invoked module returns info that is subsequently passed
to a second invoked module
– Global flow
• Info flows from one module to another via a global
data structure
INFO631 Week 3
37
www.ischool.drexel.edu
IEEE-STD-982
• Number of Entries and Exits per Module,
‘m’
– Like fan-in and fan-out
m = entries + exits
• Software Science measures
INFO631 Week 3
38
www.ischool.drexel.edu
IEEE-STD-982
• Graph-Theoretic Complexity
– Static Complexity
C = Edges - Nodes + 1
– Generalized Static Complexity
Based on summing resources needed for
each module (e.g. storage, access time, etc.)
– Dynamic complexity
Complexity as it changes over time across a
network
INFO631 Week 3
39
www.ischool.drexel.edu
IEEE-STD-982
• Cyclomatic complexity
• Minimal Unit Test Case Determination
– Determine number of independent paths
through a module, to get minimum number
of test cases for unit testing
• Data or information flow complexity
– Fan-in and fan-out of variables
INFO631 Week 3
40
www.ischool.drexel.edu
IEEE-STD-982
•
Design Structure adds weighted (%)
average of six parameters:
1.
2.
3.
4.
5.
6.
Whether designed top down (Y/N)
Module inter-dependence
Module dependence on prior processing
Database size (# of elements)
Database compartmentalization
Module single entrance and exit (Y/N)
– Weighting is chosen to meet project needs
INFO631 Week 3
41
www.ischool.drexel.edu
Other Measures
• Compiler measures
– Size (bytes of compiled code)
– Number of symbols and variables
– Cross-reference of all labels
– Statement count
INFO631 Week 3
42
www.ischool.drexel.edu
Other Measures
• Configuration Management Library
Measures
– Number of code modules
– Number of versions of each module
– History of change dates of each module
– Module size
– Number of related documents for each
module
INFO631 Week 3
43
www.ischool.drexel.edu
Availability Metrics
• Most information systems are critical to
day-to-day operations
– Witness Google or Blackberry being offline
for mere minutes is news
• Availability depends on 1) how often the
system goes down, and 2) how long it
takes to restore it after a crash
INFO631 Week 3
44
www.ischool.drexel.edu
Availability Metrics
• Perfect availability (100%) is nice to dream
of, but realistically, higher reliability is
more expensive
• Often measure availability by the number
of 9’s in the desired level of availability
– Two nines is 99%, three nines is 99.9%, four
nines is 99.99%, etc.
– How many nines can you afford?
INFO631 Week 3
45
www.ischool.drexel.edu
Availability Metrics
No. of 9’s
Availability
Down time
per year
2
99%
87.6 hours
3
99.9%
8.8 hours
4
99.99%
53 minutes
5
99.999%
5.3 minutes
INFO631 Week 3
46
www.ischool.drexel.edu
Achieving High Availability
• Many techniques are used to help ensure
that high levels of availability are possible
– Duplicate systems (clustering)
– RAID data duplication
– Duplicate power supplies
– Independent power supplies
– Uninterruptible power supplies (UPS’)
INFO631 Week 3
47
www.ischool.drexel.edu
Availability and Code Quality
• Capers Jones demonstrated a clear
connection between code quality (defect
rate) and the corresponding mean time to
failure (MTTF), which is a key aspect of
availability
– Consistent methods for measurement and
definitions of terms are needed for further
refinement
INFO631 Week 3
48
www.ischool.drexel.edu
Customer Outage Data
• In order to determine availability, the
actual customer-visible system outage
time needs to be collected
– In order to get this data, the customer must
place a very high priority on availability
– This data could be used to identify software
components which most reduce availability
INFO631 Week 3
49
www.ischool.drexel.edu
Availability
• We also expect that availability for a new
system should increase over the first
couple years of its use
• Defect causal analysis can help reduce
the root cause of defects, thereby
improving availability
INFO631 Week 3
50
www.ischool.drexel.edu

similar documents