Normalization

Report
Normalization
Stanislava Armstrong
http://www.cs.nott.ac.uk/~saw/teaching/G64DBS/lecture_slides.html
1
Last Lecture
Determine the outcome from executing each of the following
statements on the Holidays table below:
ID
Destination
Price
1
Cuba
0
2
NULL
250
3
Cuba
750
4
Colombia
NULL
5
Cuba
500
6
‘’
1000
SELECT Destination, COUNT(*)
FROM Holiday
WHERE ISNULL(Price) = TRUE
GROUP BY Destination;
ID
Destination
Price
4
Colombia
NULL
2
Coursework
• http://www.cs.nott.ac.uk/~saw/teaching/G64DBS/as
sessment.html
• 3 parts: design, implement, use
• There are 100 marks available
• Electronic submission through cw submit
• Due on Monday 6th Dec 2010, 5pm
• Late submissions will be penalized by 5% per day
• If you are working from home pay special attention
to be consistent with the names of columns
3
Normalization
There are many forms of normalization:
• First Normal Form
• Second Normal Form
• Third Normal Form
• Boyce-Cod Normal Form
• Fourth Normal Form
• Fifth Normal Form
• Domain/Key Normal Form
• Sixth Normal Form
4
Redundancy and Normalisation
• Redundant data
– Can be determined from
other data in the database
– Leads to various problems
• INSERT anomalies
• UPDATE anomalies
• DELETE anomalies
• Normalisation
– Aims to reduce data
redundancy
– Redundancy is expressed in
terms of dependencies
– Normal Forms are
hierarchical
– Each normal form required
the removal of a particular
type of dependency in
addition to those removed for
the normal forms that comes
before it.
5
First Normal Form
• In most definitions of the
relational model
– All data values should be
atomic
– This means that table entries
should be single values, not
sets or composite objects
• A relation is said to be in
first normal form (1NF) if all
data values it contains are
atomic
• To convert to a 1NF relation,
split up any non-atomic
values
6
Normalisation to 1NF
Order_date
Kingdom
King
Execution_num
Execution_date
2010-11-04,
2010-11-12
Fairyland
Queen Fairy
1,
2
2010-11-17
2010-11-14
Trolland
The Troll King
3
2010-11-18
Order_date Kingdom King
Execution_num
Execution_date
2010-11-04 Fairyland Queen Fairy
1
2010-11-17
2010-11-12 Fairyland Queen Fairy
2
2010-11-17
The Troll King 3
2010-11-18
2010-11-14 Trolland
7
Problems in 1NF
• INSERT anomalies
– Can't add a new kingdom
without adding an
Order_date for it
• DELETE anomalies
– If we remove the order from
2010-11-14 we will remove
Trolland as well
• UPDATE anomalies
– To change the king of
Fairyland we have to change
two rows
Order_date Kingdom King
Execution_num
Execution_date
2010-11-04 Fairyland Queen Fairy
1
2010-11-17
2010-11-12 Fairyland Queen Fairy
1
2010-11-17
The Troll King 2
2010-11-18
2010-11-14 Trolland
8
Functional Dependencies
• Redundancy is often caused by a
functional dependency
• A functional dependency (FD) is a
link between two sets of
attributes in a relation
• We can normalise a relation by
removing undesirable FDs
• A set of attributes, A, functionally
determines another set, B, or:
there exists a functional
dependency between A and B (A
 B), if whenever two rows of
the relation have the same values
for all the attributes in A, then
they also have the same values
for all the attributes in B.
9
Properties of FDs
• In any relation
– The primary key FDs any set
of attributes in the relation
KX
• K is the primary key, X is a
set of attributes
– Same for candidate keys
– Any set of attributes is a FD
on itself
XX
• Rules for FDs (Just for fun:)
– Reflexivity: If B is a subset of
A then
AB
– Augmentation: If A  B
then
AUCBUC
– Transitivity:
If A  B and B  C then A  C
10
FDs and Normalisation
• We define a set of 'normal
forms'
– Each normal form has fewer
FDs than the last
– Since FDs represent
redundancy, each normal
form has less redundancy
than the last
• Not all FDs cause a problem
– We identify various sorts of
FD that do
– Each normal form removes a
type of FD that is a problem
– We will also need a way to
remove FDs
11
Representation
•
•
•
•
Diagrammatically functional dependencies are represented by arrows which start from
the set of columns, which functionally determines the set of columns where the arrow
ends
The candidate keys in the relation are represented by a box which is drawn around them
Trivial FD are not represented on the diagram
FDs dependent on the whole candidate key are not represented on the diagram
Order_date
Kingdom
King
Execution_num
Execution_date
Functional dependencies can also be represented in the following way:
• {Execution_num}  {Execution_date}
• {Kingdom}  {King}
12
Partial FDs and 2NF
•
Partial FDs:
•
Let us call attributes which are part of some candidate key, key attributes, and the
rest non-key attributes.
– A FD, A  B is a partial FD, if some attribute of A can be removed and the FD still holds
– Formally, there is some proper subset of A,
C  A, such that C  B
Second normal form:
• A relation is in second normal form (2NF) if it is in 1NF and no non-key attribute is
partially dependent on a candidate key
• In other words, no C  B where C is a strict subset of a candidate key and B is a nonkey attribute.
• Normalizing to 2NF can be achieved by splitting the relation to two relations, where:
– Given that the original relation is not in 2NF due to FD A  B
– The first new relation will contain all of the columns contained in A and from B
– The second relation will contain all of the columns which are in the original relation,
but are not contained in either A or B, and the columns contained in A
13
Example
Order_date
Order_date
Kingdom
Kingdom
King
Execution_num
Execution_num
Execution_Date
Execution_Date
Kingdom
King
14
Problems Resolved in 2NF
• Problems in 1NF
• In 2NF we have resolved all
of those problems, but we
have some other problems
remaining
– INSERT – Can't add a new
kingdom without adding an
Order_date for it
– UPDATE – To change the king
of Fairyland we have to
change two rows
– DELETE – If we remove the
order from 2010-11-14 we
will remove Trolland as well
Order_date Kingdom Execution_num
Execution_date
2010-11-04 Fairyland 1
2010-11-17
2010-11-12 Fairyland 1
2010-11-17
2010-11-14 Trolland
2010-11-18
2
15
Problems Remaining in 2NF
• INSERT anomalies
– Can’t assign a date to an execution number that is not yet assigned to
an order
• UPDATE anomalies
– To change the Execution_date for execution 1 we have to change two
rows
Order_date Kingdom Execution_num
Execution_date
2010-11-04 Fairyland 1
2010-11-17
2010-11-12 Fairyland 1
2010-11-17
2010-11-14 Trolland
2010-11-18
2
16
Transitive FDs and 3NF
• Transitive FDs:
– A FD, A  C is a transitive FD, if
there is some set B such that A 
B and B  C are non-trivial FDs
– A  B non-trivial means: B is not a
subset of A
– We have
ABC
• Third normal form
– A relation is in third normal form
(3NF) if it is in 2NF and no non-key
attribute is transitively dependent
on a candidate key
• To move a relationship from 2NF
to 3NF:
– Given the transitive FD A  B 
C
– We split the relation into two
new relations
– The first contains all of the
columns contained in B and C
– The second contains all of the
columns which are not contained
in A, B or C and the columns
contained in A and B
17
Example
Order_date
Kingdom
Execution_num
Order_date
Kingdom
Execution_num
Kingdom
Execution_Date
Execution_num
Kingdom
King
Execution_Date
King
18
Problems Resolved in 3NF
• Problems in 2NF
– INSERT – Can’t assign a date
to an execution number that
is not yet assigned to an order
– UPDATE – To change the
Execution_date for execution
1 we have to change two
rows
• In 3NF all of these are
resolved (for this relation –
but 3NF can still have
anomalies!)
Order_date Kingdom Execution_num
Execution_num
Execution_date
2010-11-04 Fairyland 1
1
2010-11-17
2010-11-12 Fairyland 1
1
2010-11-17
2010-11-14 Trolland
2
2010-11-18
2
19
Problems Remaining in 3NF
• INSERT anomalies
– You can’t add melon without adding a kingdom
• UPDATE anomalies
– Changing the execution_num for the Fairlyland order, means changing
two rows
• DELETE anomalies
– Deleting Fairlyland deletes apple
Order_date
Kingdom
Fruit
Execution_num
2010-11-04
Fairyland
apple
1
2010-11-12
Fairyland
kiwi
1
2010-11-14
Trolland
kiwi
2
20
Boyce-Codd Normal Form
• A relation is in Boyce-Codd
normal form (BCNF) if for every
FD A  B either
– B is contained in A (the FD is
trivial), or
– A contains a candidate key of the
relation,
• The same as 3NF except in 3NF
we only worry about non-key Bs
• If there is only one candidate key
then 3NF and BCNF are the same
• In other words: every
determinant in a non-trivial
dependency is a (super) key.
21
Example
Order_date
fruit
OrderID
Kingdom
fruit
OrderID
Order_date
Kingdom
OrderID
22
Decomposition Properties
• Lossless: Data should not
be lost or created when
splitting up relations
• Dependency preservation:
It is desirable that FDs are
preserved when splitting
up relations
• Normalisation to 3NF is
always lossless and
dependency preserving
• Normalisation to BCNF is
lossless, but may not
preserve all dependencies
23
Normalisation and Design
• Normalisation is related to
DB design
– A database should normally
be in 3NF at least
– If your design leads to a non3NF DB, then you might want
to revise it
• When you find you have a
non-3NF DB
– Identify the FDs that are
causing a problem
– Think whether they will lead
to any insert, update, or
delete anomalies
– Try to remove them
24
Normalisation Summary
• First normal form
– All data values are atomic
• Second normal form
– In 1NF plus no non-key
attribute is partially
dependent on a candidate
key
• Third normal form
– In 2NF plus no non-key
attribute depends
transitively on a candidate
key
• Boyce-Codd Normal Form
– In 2NF plus no attribute
depends on a non super key
25
Denormalisation
• Normalisation
– Removes data redundancy
– Solves INSERT, UPDATE, and
DELETE anomalies
– This makes it easier to
maintain the information in
the database in a consistent
state
• However
– It leads to more tables in the
database
– Often these need to be joined
back together, which is
expensive to do
– So sometimes (not often) it is
worth ‘denormalising’
26
Denormalisation
• You might want to denormalise if
– Database speeds are unacceptable (not just a bit
slow)
– There are going to be very few INSERTs, UPDATEs,
or DELETEs
– There are going to be lots of SELECTs that involve
the joining of tables
27
Exercise
• Given the following schema determine what
normal form it is in and transform it to BCNF if
possible.
Price
Paid
Due
Hotel
Location
Client
Client_address
BookingID
28
Reading Material
• The Manga Guide to Databases – chapter 3
• Database Systems – A Practical Approach to Design,
Implementation and Management by Connolly and
Begg –chapters 13 and 14
• Any other book – chapter on normalization
29

similar documents