### chap04 - Gonzaga University

```Chapter 4
Logical Database Design and
the Relational Model
Jason C. H. Chen, Ph.D.
Professor of MIS
Gonzaga University
Spokane, WA 99258
[email protected]/* <![CDATA[ */!function(t,e,r,n,c,a,p){try{t=document.currentScript||function(){for(t=document.getElementsByTagName('script'),e=t.length;e--;)if(t[e].getAttribute('data-cfhash'))return t[e]}();if(t&&(c=t.previousSibling)){p=t.parentNode;if(a=c.getAttribute('data-cfemail')){for(e='',r='0x'+a.substr(0,2)|0,n=2;a.length-n;n+=2)e+='%'+('0'+('0x'+a.substr(n,2)^r).toString(16)).slice(-2);p.replaceChild(document.createTextNode(decodeURIComponent(e)),c)}p.removeChild(t)}}catch(u){}}()/* ]]> */
TM 4-1
Objectives
•
•
•
•
•
•
•
Define terms
List five properties of relations
State two properties of candidate keys
Define first, second, and third normal form
Describe problems from merging relations
Transform E-R and EER diagrams to relations
Create tables with entity and relational integrity
constraints
• Use normalization to convert anomalous tables to
well-structured relations
TM 4-2
Problem Solving for Modeling a Database Project
Study and Analyze
w/Team
User interview &
Integrated Model
ER or EER or OO
?????
?????
?????
Normalization
(3NF)
TM 4-3
Relation
• Definition: A relation is a ______, ______dimensional table of data
– Table is made up of rows (records), and columns
(attribute or field)
• Not all tables qualify as relations
• Requirements:
– Every relation has a unique name.
– Every attribute value is atomic (not multivalued, not
composite)
– Every row is unique (can’t have two rows with exactly
the same values for all their fields)
– Attributes (columns) in tables have unique names
– The order of the columns is irrelevant
– The order of the rows is irrelevant
NOTE: all relations are in
1st Normal form
TM 4-4
Correspondence with ER Model
• Relations (tables) correspond with entity types and
with many-to-many relationship types
• Rows correspond with entity instances and with manyto-many relationship instances
• Columns correspond with attributes
• NOTE: The word relation (in relational database) is
NOT the same as the word relationship (in ER
model)
TM 4-5
Key Fields
• Keys are special fields that serve two main purposes:
– _______ keys are unique identifiers of the relation in
question. Examples include employee numbers, social
security numbers, etc. This is how we can guarantee that
all rows are unique
– _______ keys are identifiers that enable a dependent
relation (on the many side of a relationship) to refer to its
parent relation (on the one side of the relationship)
• Keys can be simple (a single field) or composite
(more than one field)
• Surrogate key
• Keys usually are used as indexes to speed up the
response to user queries (More on this in Ch. 5)
TM 4-6
Sample E-R Diagram (Figure 2-1)
TM 4-7
Figure 4-3 Schema for four relations (Pine Valley Furniture Company)
Primary Key
Foreign Key
(implements 1:N relationship
between customer and order)
Combined, these are a composite primary
key (uniquely identifies the order
line)…individually they are foreign keys
(implement M:N relationship between
order and product)
TM 4-8
Fig. 4-3: Schema for four relations (Pine Valley Furniture)
Graphical and Text Representations
CUSTOMER(Customer_ID,
City,State,Zip)
ORDER(Order_ID,
Order_Date,Customer_ID)
ORDER_LINE(Order_ID,
Product_ID,Quantity)
PRODUCT(Product_ID,
Product_Description,
Product_Finish,Unit_Price,
On_Hand)
TM 4-9
Fig. 4-1: EMPLOYEE1 Relation with sample data
EMPLOYEE1
EmpID
Name
DeptName
Salary
100
140
110
190
150
Margaret Simpson
Allen Beeton
Chris Lucero
Lorenzo Davis
Susan Martin
Marketing
Accounting
Info. System
Finance
Marketing
48,000
52,000
43,000
55,000
42,000
TM 4-10
Fig. 4-2: Eliminating multi-valued attributes
(a) Table with repeating groups or multi-valued attributes
(Un-Normalized)
EMPLOYEE
EmpID Name
DeptName
100
Margaret Simpson Marketing
140
110
Allen Beeton
Chris Lucero
Accounting
Info. System
190
150
Lorenzo Davis
Susan Martin
Finance
Marketing
Salary Course
Title
Date
Completed
48,000 SPSS
6/19/200X
Surveys 10/7/200X
52,000 Tax Acc 12/8/200X
43,000 SPSS
1/12/200X
C++
4/22/200X
55,000
42,000 SPSS
6/16/200X
Java
8/12/200X
TM 4-11
Figure 4-2 (a) Table with repeating groups – how to “remove” them (and solve the problem)
EmpID
Name
DeptName
100
Margaret Simpson
Marketing
140
110
Allen Beeton
Chris Lucero
Accounting
Info. System
190
150
Lorenzo Davis
Susan Martin
Finance
Marketing
Salary Course
Title
48,000 SPSS
Surveys
52,000 Tax Acc
43,000 SPSS
C++
55,000
42,000 SPSS
Java
Date
Completed
6/19/200X
10/7/200X
12/8/200X
1/12/200X
4/22/200X
6/16/200X
8/12/200X
Figure 4-2 (b) EMPLOYEE2 relation
EmpID
Name
DeptName
100
Margaret Simpson Marketing
100
Margaret Simpson Marketing
140
Allen Beeton
Accounting
110
Chris Lucero
Info. System
110
Chris Lucero
Info. System
190
Lorenzo Davis
Finance
150
Susan Martin
Marketing
150
Martin
Marketing
Wesley Longman,
Inc. & Dr. Chen, Business Database
Systems
Salary Course Date
Title
Completed
48,000 SPSS
6/19/200X
48,000 Surveys 10/7/200X
52,000 Tax Acc 12/8/200X
43,000 SPSS
1/12/200X
43,000 C++
4/22/200X
55,000
42,000 SPSS
6/16/200X
42,000 Java
8/12/200XTM 4-12
Fig. 4-2: Eliminating multi-valued attributes
(b) EMPLOYEE2 Relation (Normalized)
EMPLOYEE2
EmpID
Name
DeptName
100
100
140
110
110
190
150
150
Margaret Simpson Marketing
Margaret Simpson Marketing
Allen Beeton
Accounting
Chris Lucero
Info. System
Chris Lucero
Info. System
Lorenzo Davis
Finance
Susan Martin
Marketing
Susan Martin
Marketing
Salary Course
Title
Date
Completed
48,000
48,000
52,000
43,000
43,000
55,000
42,000
42,000
6/16/200X
8/12/200X
SPSS
6/19/200X
Surveys 10/7/200X
Tax Acc 12/8/200X
SPSS
1/12/200X
C++
4/22/200X
SPSS
Java
TM 4-13
Constraints
• Domain Constraints
– Allowable values for an attribute.
– A domain definition contains: domain name, data
type, size, meaning, and allowable values/range (if
applicable).
• Entity Integrity
– No primary key attribute may be null.
• Referential Integrity
– A relationship between primary key and foreign key.
• Operational Constraints
– Business rules (see Chapter 3)
TM 4-14
Integrity Constraints
• _________ Integrity – rule that states that any foreign
key value (on the relation of the many side) MUST
match a primary key value in the relation of the one side.
(Or the foreign key can be null)
– For example: Delete Rules
• Restrict – don’t allow delete of “parent” side if related
rows exist in “dependent” side
• Cascade – automatically delete “dependent” side rows that
correspond with the “parent” side row to be deleted
CONSTRAINTS; (p.110 of Oracle 11g)
• Set-to-Null – set the foreign key in the dependent side to
null if deleting from the parent side  not allowed for weak
entities
TM 4-15
Fig. 4-5: Referential integrity constraints (Pine Valley Furniture)
pk
fk
pk
cpk/pk
fk
fk
Referential
integrity
constraints are
drawn via arrows
from dependent to
parent table
pk
TM 4-16
Figure 4-6 SQL table definitions
Referential integrity
constraints are
implemented with
foreign key to
primary key
references
TM 4-17
CUSTOMER
Customer_ID
Customer_Name
123
345
489
789
…
John Smith
Allen Stone
Mary Jones
Kent Watson
S. 34 Freya, Spokane
45 A St. Cheney
E. 21 Trent, Spokane
23 Y Ave. Spokane
PK
ORDER
FK
Order_ID
Order_Date
Customer_ID
OR-002
OR-003
OR-004
OR-004
…
9/21/1998
9/30/1998
10/1/1998
10/2/1998
123
345
123
489
642
TM 4-18
STUDENT
pk
SID
SLNAME
100
Miller
…
105
Connoly
…
9188
FACULTY
pk
FID
1
…
5
fk
FID
1
…
3
fk
FLNAME
Cox
…
Brown
LOCATION
pk
LOCID
BLDG_CODE
45
CR
…
57
LIB
pk
(FACULTY)
SPIN
8891
LOCID
53
…
57
ROOM
101
222
…
FPIN
1181
…
9899
CAPACITY
150
…
1
fk
(STUDENT)
RULES:
1. You can’t add a record to the table with fk, (e.g., STUDENT) unless
there is a corresponding record in the table with pk.
2. You can’t delete a record in the table with pk (e.g., FACULTY) if there is a record in the
table with fk.
 Order of entering data into the database: LOCATION  FACULTY STUDENT
 Order of deleting data from the database:STUDENT  FACULTY  LOCATION
Well-Structured Relations
• A well-structured relation contains minimal
redundancy and allows users to insert,
modify, and delete the rows in a table
without errors or inconsistencies.
• The following anomalies should be
removed for a well-structured relation:
– Insertion Anomaly
– Deletion Anomaly
– Modification Anomaly
TM 4-20
Is EMPLOYEE2 a WellStructured relation?
EMPLOYEE2
EmpID
Name
DeptName
100
100
140
110
110
190
150
150
Margaret Simpson Marketing
Margaret Simpson Marketing
Allen Beeton
Accounting
Chris Lucero
Info. System
Chris Lucero
Info. System
Lorenzo Davis
Finance
Susan Martin
Marketing
Susan Martin
Marketing
Salary Course
Title
48,000
48,000
52,000
43,000
43,000
55,000
42,000
42,000
Date
Completed
SPSS
6/19/200X
Surveys 10/7/200X
Tax Acc 12/8/200X
SPSS
1/12/200X
C++
4/22/200X
SPSS
Java
6/16/200X
8/12/200X
TM 4-21
EMPLOYEE2
EmpID
Name
DeptName
100
100
140
110
110
190
150
150
Margaret Simpson Marketing
Margaret Simpson Marketing
Allen Beeton
Accounting
Chris Lucero
Info. System
Chris Lucero
Info. System
Lorenzo Davis
Finance
Susan Martin
Marketing
Susan Martin
Marketing
Salary Course
Title
Date
Completed
48,000
48,000
52,000
43,000
43,000
55,000
42,000
42,000
SPSS
Surveys
Tax Acc
SPSS
C++
6/19/200X
10/7/200X
12/8/200X
1/12/200X
4/22/200X
SPSS
Java
6/16/200X
8/12/200X
TM 4-22
Is EMPLOYEE2 a WellStructured relation?
No/Yes
WHY?
TM 4-23
__________Anomaly: Inserting a new row, the user
must supply values for both EmpID (PK) and CourseTitle
(CPK and FK). This is an (insertion) anomaly, since the
user should be able to enter employee data without
knowing (supplying) course (title) data.
EMPLOYEE2
EmpID
Name
DeptName
100
100
140
110
110
190
150
150
Margaret Simpson Marketing
Margaret Simpson Marketing
Allen Beeton
Accounting
Chris Lucero
Info. System
Chris Lucero
Info. System
Lorenzo Davis
Finance
Susan Martin
Marketing
Susan Martin
Marketing
Salary Course Date
Title
Completed
48,000 SPSS
6/19/200X
48,000 Surveys 10/7/200X
52,000 Tax Acc 12/8/200X
43,000 SPSS
1/12/200X
43,000 C++
4/22/200X
55,000
42,000 SPSS
6/16/200X
42,000 Java
8/12/200X
TM 4-24
Deletion Anomaly: Deleting the employee number
140, it results in losing not only the employee’s
information but also the course had an offering that
completed on that date.
EMPLOYEE2
EmpID
Name
DeptName
100
100
140
110
110
190
150
150
Margaret Simpson Marketing
Margaret Simpson Marketing
Allen Beeton
Accounting
Chris Lucero
Info. System
Chris Lucero
Info. System
Lorenzo Davis
Finance
Susan Martin
Marketing
Susan Martin
Marketing
Salary Course Date
Title
Completed
48,000 SPSS
6/19/200X
48,000 Surveys 10/7/200X
52,000 Tax Acc 12/8/200X
43,000 SPSS
1/12/200X
43,000 C++
4/22/200X
55,000
42,000 SPSS
6/16/200X
42,000 Java
8/12/200X
TM 4-25
________ Anomaly: If the employee number 100 gets
a salary increase, we must record the increase in each of
the rows for that employee (two occurences); otherwise
the data will be inconsistent.
EMPLOYEE2
EmpID Name
100
100
140
110
110
190
150
150
DeptName
Margaret Simpson Marketing
Margaret Simpson Marketing
Allen Beeton
Accounting
Chris Lucero
Info. System
Chris Lucero
Info. System
Lorenzo Davis
Finance
Susan Martin
Marketing
Susan Martin
Marketing
Salary Course Date
Title
Completed
48,000 SPSS
6/19/200X
48,000 Surveys 10/7/200X
52,000 Tax Acc 12/8/200X
43,000 SPSS
1/12/200X
43,000 C++
4/22/200X
55,000
42,000 SPSS
6/16/200X
42,000 Java
8/12/200X
TM 4-26
Fig. 4-7: EMP_COURSE: Normalized Relations from EMPLOYEE2
EMPLOYEE2
EmpID
Name
DeptName
100
100
140
110
110
190
150
150
Margaret Simpson
Margaret Simpson
Allen Beeton
Chris Lucero
Chris Lucero
Lorenzo Davis
Susan Martin
Susan Martin
Marketing
Marketing
Accounting
Info. System
Info. System
Finance
Marketing
Marketing
Salary Course
Title
48,000
48,000
52,000
43,000
43,000
55,000
42,000
42,000
SPSS
6/19/201X
Surveys 10/7/201X
Tax Acc 12/8/201X
SPSS
1/12/201X
C++
4/22/201X
SPSS
Java
EMPLOYEE1
EmpID
100
140
110
190
150
Name
Margaret Simpson
Allen Beet
Chris Lucero
Lorenzo Davis
Sususan Martin
Date
Completed
6/16/201X
8/12/201X
EMP_COURSE
DeptName Salary
Marketing
Accounting
Info. System
Finance
Marketing
48,000
52,000
43,000
55,000
42,000
Is
there
any
anomaly
?
EmpID
100
100
140
110
110
150
150
??
Course Date
Title
Completed
SPSS
6/19/201X
Surveys 10/7/201X
Tax Acc 12/8/201X
SPSS
1/12/201X
C++
4/22/201X
SPSS
6/19/201X
Java
8/12/201X
TM 4-27
Problem Solving for Modeling a Database Project
Study and Analyze
w/Team
User interview &
Integrated Model
ER or EER or OO
Relations Transformation
(Seven Cases)
Transformation to Relations
?????
Normalization
(3NF)
TM 4-28
Seven Cases of Transforming
EE-R Diagrams into Relations
1. Map Regular Entities
2. Map Weak Entities
3. Map Binary Relationships
4. Map Associative Entities
5. Map Unary Relationships
6. Map Ternary (and n-ary) Relationships
7. Map Supertype/Subtype Relationships
TM 4-29
Transforming EE-R Diagrams
into Relations
1. Map Regular Entities to Relations
 E-R attributes map directly onto the relation
(Fig. 4-8)
 Composite attributes: Use only their simple,
component attributes (Fig. 4-9).
 Multi-valued Attribute : Becomes a separate
relation with a foreign key taken from the
superior entity (Fig. 4-10).
TM 4-30
Fig. 4-8: Mapping the regular entity CUSTOMER
(a) CUSTOMER entity type
(b) CUSTOMER relation
[Same name on relation and entity type]
TM 4-31
Figure 4-9 Mapping a composite attribute
(a) CUSTOMER
entity type with
composite
attribute
(b) CUSTOMER relation with address detail
[Use only their simple, component attributes]
TM 4-32
Fig. 4-10: Mapping an entity with a multivalued attribute
(a) Employee entity type with multivalued attribute
Multivalued attribute becomes a separate relation with foreign key
(b) Mapping a multivalued attribute
EMPLOYEE
EmployeeID
EmployeeID
EmployeeName
Skill
One–to–many relationship between original entity and new relation
[Two
relations
created
with one
containing
all of the
attributes
except the
multivalued
attribute,
and the
second one
contains
the pk (on
the first
one) and
the multivalued
attribute]
TM 4-33
Break ! (Ch. 4 - Part I)
In class exercise
#1-I (a) (p.193),
apply Figure 2-8.
HW
# 1-I (c), apply
Figure 2-11.a
Fig. 3-8:
Figure 2-8
TM 4-34
Problem Solving for Modeling a Database Project
Study and Analyze
w/Team
User interview &
Integrated Model
ER or EER or OO
Relations Transformation
(Seven Cases)
Transformation to Relations
?????
Normalization
(3NF)
TM 4-35
Seven Cases of Transforming
EE-R Diagrams into Relations
1. Map Regular Entities
2. Map Weak Entities
3. Map Binary Relationships
4. Map Associative Entities
5. Map Unary Relationships
6. Map Ternary (and n-ary) Relationships
7. Map Supertype/Subtype Relationships
TM 4-36
Transforming EER Diagrams into
Relations
2. Mapping Weak Entities
– Becomes a separate relation with
a foreign key taken from the
superior entity
– Primary key composed of:
• Partial identifier of weak entity
• Primary key of identifying relation
(strong entity) (Fig. 4-11)
TM 4-37
Fig. 4-11: Example of mapping a weak entity
(a) Weak entity DEPENDENT
Fig. 4-11: (b) Relations resulting from weak entity
[Becomes a separate relation with a foreign
key taken from the superior entity]
PK
CPK
NOTE: the domain constraint for the foreign key should
NOT allow null value if DEPENDENT is a weak entity
FK
Question: Why need all FOUR attributes to be a CK?
TM 4-38
Transforming EE-R Diagrams
Into Relations
3. Map Binary Relationships
– One-to-Many - Primary key on the one side
becomes a foreign key on the many side (Fig.
4-12).
– Many-to-Many - Create a new relation with the
primary keys of the two entities as its primary
key (Fig. 4-13).
– One-to-One - Primary key on the mandatory
side becomes a foreign key on the optional side
(Fig. 4-14).
TM 4-39
Fig. 4-12: Example of mapping a 1:M relationship
(a) Relationship between customers and orders
Note the mandatory one
Fig. 4-12: (b) Mapping the relationship
[Primary key on the one side becomes a
foreign key on the many side]
Foreign key
Again, no null value in the
foreign key…this is because
of the mandatory minimum
cardinality
TM 4-40
Figure 4-13 Example of mapping an M:N relationship
a) Completes relationship (M:N)
The Completes relationship will need to become a separate relation
TM 4-41
Figure 4-13 Example of mapping an M:N relationship (cont.)
b) Three resulting relations
Composite primary key (cpk)
Foreign key
Foreign key
New
intersection
relation
TM 4-42
Figure 4-14 Example of mapping a binary 1:1 relationship
a) In_charge relationship (1:1)
mandatory
optional
Often in 1:1 relationships, one direction is optional.
TM 4-43
Fig. 4-14: (b) Resulting relations
mandatory
Same domain
as Nurse_ID
PK
optional
FK
[Primary key on the mandatory side becomes a foreign key on the
optional side]
TM 4-44
Transforming EE-R Diagrams
Into Relations
4. Map Associative Entities
– Identifier Not Assigned
• Default primary key for the association relation is
composed of the primary keys of the two entities (as
in M:N relationship) (Fig. 4-15)
– Identifier Assigned
• It is natural and familiar to end-users.
• Default identifier may not be unique. (Fig. 4-16).
TM 4-45
Figure 4-15 Example of mapping an associative entity
B
A
a) An associative entity
A
B
A
TM 4-46
Figure 4-15 Example of mapping an associative entity (cont.)
b) Three resulting relations
PK
fk
[Default primary key for the association
relation is the primary keys of the two entities]
cpk
fk
PK
Composite primary key formed from the two foreign keys
TM 4-47
Figure 4-16: Mapping an associative entity
(a) Associative entity (SHIPMENT)
[Default primary key
for the association
relation is assigned]
(b) Three resulting relations
Primary key differs from foreign keys
TM 4-48
Transforming EE-R Diagrams
Into Relations
5. Map Unary Relationships
– One-to-Many
• Recursive foreign key in the same relation (Fig. 4-17).
• A recursive FK is a FK in a relation that references the
PK values of that same relation. It must have the same
domain as the PK.
– Many-to-Many - Bill-of-materials: Two relations:
• One for the entity type.
• One for an associative relation in which the primary
key has two attributes, both taken from the primary
key of the entity. (Fig. 4-18).
TM 4-49
Figure 4-17 Mapping a unary 1:N relationship
(a) EMPLOYEE entity with unary relationship
A recursive
FK is a FK
in a relation
that
references
the PK
values of
that same
relation.
It must have
the same
domain as
the PK.
(b) EMPLOYEE relation with recursive foreign key
EMPLOYEE
EmployeeID EmployeeName
EmployeeDateOfBirth
ManagerID
TM 4-50
Figure 4-18: Mapping a unary M:N relationship
One for the entity
type.
One for an
associative
relation in which
the primary key
has two
attributes, both
taken from the
primary key of
the entity.
(a) Bill-ofmaterials
relationships
(M:N)
(b) ITEM and
COMPONEN
T relations
ITEM
PK
ItemNo
fk
ItemDescription
COMPONENT
ItemNo
cpk
ItemUnitCost
fk
ComponentNo
Quantity
TM 4-51
Transforming EE-R Diagrams
Into Relations
6. Map Ternary (and n-ary) Relationships
– One relation for each entity and one for the
associative entity.
– Associative entity has foreign keys to each
entity in the relationship
– (Fig. 4-19).
TM 4-52
Figure 4-19 Mapping a ternary relationship
a) PATIENT TREATMENT Ternary relationship with
associative entity
TM 4-53
Figure 4-19 Mapping a ternary relationship (cont.)
b) Mapping the ternary relationship PATIENT TREATMENT
A patient may receive a treatment
one in the morning, then the same
treatment in the afternoon.
TM 4-54
Figure 4-19 Mapping a ternary relationship (cont.)
b) Mapping the ternary relationship PATIENT TREATMENT
Remember that
This is why
the primary treatment date and
key MUST be time are included in
unique
the composite
primary key
But this makes a
very cumbersome
key…
It would be better
to create a
surrogate key like
Treatment#
How to create it
with Oracle?
TM 4-55
Transforming EER Diagrams into
Relations
7. Mapping Supertype/Subtype Relationships
– One relation for supertype and for each subtype
– Supertype attributes (including identifier and subtype
discriminator) go into supertype relation
– Subtype attributes go into each subtype; primary key
of supertype relation also becomes primary key of
subtype relation
– 1:1 relationship established between supertype and
each subtype, with supertype as primary table (Fig. 420).
TM 4-56
Figure 4-20 Supertype/subtype relationships
TM 4-57
Figure 4-21
Mapping Supertype/subtype relationships to relations
These are
implemented as oneto-one relationships
Display a table that
contains all the attributes
for
SALARIED_EMPLOYEE
SELECT *
FROM EMPLOYEE,
SALARIED_EMPLOYEE
WHRE Employee_Number=
S_Employee_Number;
TM 4-58
TM 4-59
Break ! (Ch. 4 - Part II)
Figure 3-6(b)
In class exercise
Transform it to relations
(NOT 3NF)
#2-III-a , (p.193, apply
Figure 3-6.b
HW
#2-III-d, (p.193), apply
Figure 3-10
Partial
Specialization
Fig. 3-8:
TM 4-60
Break ! (Ch. 4 - Part II)
In class exercise (another set)
(p.193)
Transform it to relations (NOT
3NF)
#2-III (b)
(Figure 3-7a)
(see next slide)
HW
#2-III (c)
(Figure 3-9)
TM 4-61
Fig. 3-7: Examples of disjointness constraints
(a) Disjoint rule
TM 4-62
Fig. 3-9: Subtype discriminator (overlap rule)
TM 4-63
Problem Solving for Modeling a Database Project
Study and Analyze
w/Team
User interview &
Integrated Model
ER or EER or OO
Relations Transformation
(Seven Cases)
Transformation to Relations
Normalization
Normalization
(3NF)
IMPLEMENTATION
TM 4-64
Seven Cases of Transforming
EE-R Diagrams into Relations
1. Map Regular Entities
2. Map Weak Entities
3. Map Binary Relationships
4. Map Associative Entities
5. Map Unary Relationships
6. Map Ternary (and n-ary) Relationships
7. Map Supertype/Subtype Relationships
TM 4-65
Next Topic
• Next topic is the most important topic
(theory) in this database management class.
• What is it?
• Normalization
TM 4-66
Data Normalization
• The process of decomposing
relations with anomalies to
produce smaller, wellstructured and stable relations
• Primarily a tool to validate and
improve a logical design so that
it satisfies certain constraints
that avoid unnecessary
duplication of data
TM 4-67
Well-Structured Relations
• A relation that contains minimal data redundancy
and allows users to insert, delete, and update rows
without causing data inconsistencies
• Goal is to avoid (minimize) anomalies
– Insertion Anomaly – adding new rows forces user to
create duplicate data
– Deletion Anomaly – deleting rows may cause a loss of
data that would be needed for other future rows
– Modification Anomaly – changing data in a row forces
changes to other rows because of duplication
General rule of thumb: a table should not pertain to
more than one entity type
TM 4-68
Example – Figure 4.2b
Question – Is this a relation?
Answer – Yes: unique rows and no
multivalued attributes
Question – What’s the primary key?
a composite key
TM 4-69
Anomalies in this Table
• Insertion – can’t enter a new employee without having
the employee take a class
• Deletion – if we remove employee 140, we lose
information about the existence of a Tax Acc class
• Modification – giving a salary increase to employee
100 forces us to update multiple records
Why do these anomalies exist?
Because are two themes (entity types – what are they?) in this one
relation (two themes, entity types, were combined). This results in
duplication, and an unnecessary dependency between the entities
TM 4-70
Functional Dependencies and Keys
• Functional Dependency: The value of one attribute (the
determinant) determines the value of another attribute.
• Candidate Key
– A unique identifier. One of the candidate keys will become
the primary key
• E.g. perhaps there is both credit card number and SS# in
a table…in this case both are candidate keys
– Each non-key field is functionally dependent on every
candidate key
TM 4-71
Figure 4-23: Representing Functional Dependencies (cont.)
(a) Functional dependencies in EMPLOYEE1 Fig. 4-2a)
(b) Functional dependencies in EMPLOYEE2 (Fig. 4-2b)
TM 4-72
First Normal Form
• No multivalued attributes
• Every attribute value is atomic (singledvalue)
• Fig. 4-2a is not in 1st Normal Form
(multivalued attributes)  it is not a relation
• Fig. 4-2b is in 1st Normal form (but not in a
well-structured relation)
• All relations are in 1st Normal Form
• The following example is not from the text
will be illustrated for Normalization process.
TM 4-73
Figure 4-2 (a)
EmpID
Name
DeptName
Salary
100
Margaret Simpson
Marketing
48,000
140
110
Allen Beeton
Chris Lucero
Accounting
Info. System
52,000
43,000
190
150
Lorenzo Davis
Susan Martin
Finance
Marketing
55,000
42,000
Course Date
Title
Completed
SPSS
6/19/200X
Surveys 10/7/200X
Tax Acc 12/8/200X
SPSS
1/12/200X
C++
4/22/200X
SPSS
Java
6/16/200X
8/12/200X
Figure 4-2 (b)
EmpID Name
100
100
140
110
110
190
150
150
DeptName
Margaret Simpson Marketing
Margaret Simpson Marketing
Allen Beeton
Accounting
Chris Lucero
Info. System
Chris Lucero
Info. System
Lorenzo Davis
Finance
Susan Martin
Marketing
Susan Martin
Marketing
Salary Course
Title
48,000 SPSS
48,000 Surveys
52,000 Tax Acc
43,000 SPSS
43,000 C++
55,000
42,000 SPSS
42,000 Java
Date
Completed
6/19/200X
10/7/200X
12/8/200X
1/12/200X
4/22/200X
6/16/200X
8/12/200X
TM 4-74
Second Normal Form
• 1NF and every non-key attribute is fully
functionally dependent on the primary key.
• Every non-key attribute must be defined by the
entire key (either a single PK or a CPK), not
by only part of the key.
• No partial functional dependencies.
• Fig. 4-2b is NOT in 2nd Normal Form
TM 4-75
Figure: 4-22 Steps in normalization
Table with Multivalued
attributes
Remove Multivalued
Attributes
First normal
form (1NF)
Second normal
form(2NF)
Remove _____
Dependencies
Remove …
Third normal
form (3NF)
Boyce-Codd normal
form (BC-NF)
Fourth normal
Form (4NF)
Fifth normal
form (5NF)
Remove remaining
anomalies resulting from
multiple candidate keys
Remove Multivalued
Dependencies
Remove Remaining
Anomalies
TM 4-76
A Process of 1NF to 2NF (EMPLOYEE2 - - 1NF)
(b) Functional Dependencies in EMPLOYEE2
Dependency on entire primary key
EmpID
CourseTitle
Name
DeptName
Salary DateCompleted
Dependency on only part of the key
EmpID
Name
DeptName
100
100
140
110
110
190
150
150
Margaret Simpson Marketing
Margaret Simpson Marketing
Allen Beeton
Accounting
Chris Lucero
Info. System
Chris Lucero
Info. System
Lorenzo Davis
Finance
Susan Martin
Marketing
Susan Martin
Marketing
Salary Course
Title
Date
Completed
48,000
48,000
52,000
43,000
43,000
55,000
42,000
42,000
SPSS
Surveys
Tax Acc
SPSS
C++
6/19/200X
10/7/200X
12/8/200X
1/12/200X
4/22/200X
SPSS
Java
6/16/200X
8/12/200X
TM 4-77
Functional Dependencies in EMPLOYEE2
Dependency on entire primary key
EmpID
CourseTitle
Name
DeptName
Salary
DateCompleted
Dependency on only part of the key (partial dep.)
EmpID, CourseTitle  DateCompleted
EmpID  Name, DeptName, Salary
Therefore, NOT in 2nd Normal Form!!
TM 4-78
EMPLOYEE2
EmpID
Summary on Normalization: from 1NF to 2NF
CourseTitle
Name
DeptName
Salary DateCompleted
Partial Depend.
EMPLOYEE1
EmpID Name DeptName Salary
EMP_COURSE
2NF
3NF ?
EmpID
CourseTitle
DateCompleted
TM 4-79
Summary on Normalization
EMPLOYEE2 (1NF)
EmpID
Name
DeptName
Salary Course
Title
100
100
140
110
110
190
150
150
Margaret Simpson
Margaret Simpson
Allen Beeton
Chris Lucero
Chris Lucero
Lorenzo Davis
Susan Martin
Susan Martin
Marketing
Marketing
Accounting
Info. System
Info. System
Finance
Marketing
Marketing
48,000
48,000
52,000
43,000
43,000
55,000
42,000
42,000
EMPLOYEE1
EmpID
100
140
110
190
150
Name
Margaret Simpson
Allen Beet
Chris Lucero
Lorenzo Davis
Sususan Martin
SPSS
6/19/200X
Surveys 10/7/200X
Tax Acc 12/8/200X
SPSS
1/12/200X
C++
4/22/200X
SPSS
Java
Salary
Marketing
Accounting
Info. System
Finance
Marketing
48,000
52,000
43,000
55,000
42,000
Is
there
any
anomaly
(ies)?
6/16/200X
8/12/200X
EMP_COURSE
3NF
DeptName
Date
Completed
EmpID
Course
Title
Date
Completed
100
100
140
110
110
150
150
SPSS
6/19/200X
Surveys 10/7/200X
Tax Acc 12/8/200X
SPSS
1/12/200X
C++
4/22/200X
SPSS
6/19/200X
Java
8/12/200X
TM 4-80
Third Normal Form
• 2NF and no transitive dependencies
(functional dependency between non-key
attributes.)
• Other examples
TM 4-81
Figure: 4-22 Steps in normalization
Table with Multivalued
attributes
Remove Multivalued
Attributes
First normal
form (1NF)
Second normal
form(2NF)
Third normal
form (3NF)
Boyce-Codd normal
form (BC-NF)
Fourth normal
Form (4NF)
Fifth normal
form (5NF)
Remove Partial
Dependencies
Remove _________
Dependencies
Remove remaining
anomalies resulting from
multiple candidate keys
Remove Multivalued
Dependencies
Remove Remaining
Anomalies
TM 4-82
Extra example-1: Relation with transitive dependency
(a) SALES relation with simple data
SALES
Cust_ID
Name
Salesperson
Region
8023
Anderson
101
South
9167
Bancroft
102
West
7924
Hobbs
101
South
6837
Tucker
103
East
8596
Eckersley
102
West
7018
Arnold
104
North
TM 4-83
What Anomalies might be in SALES
relation?
WHY? Because it is …
• Insertion anomaly ?
• Deletion anomaly ?
• Modification anomaly ?
Not in the 3NF (why?)
(transitive dependency)
SALES
Cust_ID
Name
Salesperson
Region
8023
Anderson
101
South
9167
Bancroft
102
West
7924
Hobbs
101
South
6837
Tucker
103
East
8596
Eckersley
102
West
7018
Arnold
104
North
N
TM 4-84
Extra example-1: Relation with transitive dependency
CustID  Name
CustID  Salesperson
CustID  Region
All this is OK
(2nd NF)
TM 4-85
Extra example-1: Relation with transitive dependency
CustID  Name
CustID  Salesperson
CustID  Region
and
Salesperson  Region
All this is OK
(2nd NF)
BUT
CustID  Salesperson  Region
implies
CustID  Region
Transitive dependency
(not in 3rd NF)
TM 4-86
Extra example-1: (b) Relations in 3NF
Remove a transitive dependency
TM 4-87
Extra example-1: Removing a transitive dependency
(a) Decomposing the SALES relation
SALES1
S_PERSON
Cust_ID
Name
Salesperson
8023
Anderson
101
101
South
9167
Bancroft
102
102
West
7924
Hobbs
101
103
East
6837
Tucker
103
104
North
8596
Eckersley
102
7018
Arnold
104
Salesperson Region
TM 4-88
Extra example-2: Relation transitive dependencies
Snum
Origin
Destination
Distance
409
618
723
824
629
Seattle
Chicago
Boston
Denver
Minneapolis
Denver
Dallas
Atlanta
Los Angeles
St. Louis
1,537
1,058
1,214
1,150
587
?NF
Insertion anomaly?
Deletion anomaly?
Modification anomaly?
TM 4-89
Extra example-2: Relation transitive dependencies
SHIPMENT
Snum Origin Destination Distance
Snum
Origin
409
618
723
824
629
Seattle
Chicago
Boston
Denver
Minneapolis
Destination
Denver
Dallas
Atlanta
Los Angeles
St. Louis
?NF
Origin
Destination
Distance
Seattle
Chicago
Boston
Denver
Minneapolis
Denver
Dallas
Atlanta
Los Angeles
St. Louis
1,537
1,058
1,214
1,150
587
TM 4-90
EMPLOYEE2
EmpID
Summary on Normalization: from 1NF to 2NF
CourseTitle
Name
DeptName
Salary DateCompleted
Partial Depend.
EMPLOYEE1
EmpID Name DeptName Salary
EMP_COURSE
2NF
3NF ?
EmpID
CourseTitle
DateCompleted
TM 4-91
Figure: 4-22 Steps in normalization
Table with Multivalued
attributes
Remove ________
Attributes
First normal
form (1NF)
Second normal
form(2NF)
Third normal
form (3NF)
Boyce-Codd normal
form (BC-NF)
Fourth normal
Form (4NF)
Fifth normal
form (5NF)
Remove _______
Dependencies
Remove ______
Dependencies
Remove remaining
anomalies resulting from
multiple candidate keys
Remove Multivalued
Dependencies
Remove Remaining
Anomalies
TM 4-92
Steps of Database Development
User view-1
User view-2
User view-3
…
…
User view-N
…
User interview &
Integrated Model
Conceptual Schema (Model)
Logical Model
(ERD or E/ERD)
(Seven) Relations ________
(more relations
produced)
_____________ (up to 3NF)
(more tables
created)
Implementation
(w/Physical Model)
TM 4-93
END of CHAPTER 4
In class exercise
(p.193)
#3- a,b,c,d
HW (using Visio)
(p.193-194)
#7 - a,b,c,d,e,f
Bonus
#8 – a,b,c,d
In-class Quiz next class
TM 4-94
MVC_Hospital HW
Logical Design Phase
Draw a entity-relationship diagram (enterprise model) for Mountain
View community Hospital, based on the narrative description of the case
and this handout (but the entities are from the five (5) figures shown
above). You should create a file and turn in with a hardcopy (called
MVC_Hospital_DD.docx) contains the following materials:
1. Read and employ materials from chapters 2,3 and 4.
2. Include entities, associations (with detail cardinality), and attributes.
3. Determine and draw the order of entering data
Next phase -- implementation, create SQL script file for table structure
and data base (values).
TM 4-95
Hint: You need to create VIEW (one or more) to help
you create SQL efficiently and effectively
See sample on the Bb
-- version 1 for charge_view that includes Patient Name
CREATE OR REPLACE VIEW charge_view(Patient_No,Patient
_Name, Item_Code, Charge) AS
SELECT patient.patient_no, patient.p_first|| ' ' ||patient.p_last,
FROM patient, pt_charg, item
WHERE item.item_code = pt_charg.item_code
AND patient.patient_n
o = pt_charg.patient_no
ORDER BY patient.p_last;
TM 4-96
MVC_Hospital
Create two script files:
1. a script file (MVC_Hospital_Lastname_Firstname.SQL) that contains a
set of commands of DROP, CREATE, and INSERT that performs the same
functions as in the script file of Northwoods.sql
2. Second script file
(MVC_Hospital_QUERIES_Lastname_Firstname.SQL) containing a set of
SQL commands that answer the questions. Test the query one/time
successfully.
Note that you may need other SQL commands and create database views
(see pptx file for introducing VIEWS) for the purpose of answering
questions easily. You may need to read other references related the SQL
from the text book (e.g., Chapter 7 of McFadden).
3. Spool (2) and save it in the file
MVC_Hospital_Spool_Lastname_Firstname.txt Finally, you create a new
file (*.docx) containing all work done from Part I and save them in the file
MVC_Hospital_Complete_Lastname_Firstname.docx. The file should
contain your class information and personal information.
TM 4-97
Normalized vs. De-normalized
• We will study the concept and technique of
“normalization and de-normalization” as
well as OLTP and OLAP.
TM 4-98
More on OLTP vs. OLAP
pk
fk
pk
pk: primary key
fk: foreign key
Fig. Extra-a: A simple
database with a relation
between two tables.
For those have database
background.
• The figure depicts a relational
database environment with two
tables.
• The first table contains
pets. The tables are related by
the single column they have in
common: Owner_ID.
• By relating tables to one another,
we can reduce ____________ of
data and improve database
performance.
• The process of breaking tables
apart and thereby reducing data
redundancy is called
_______________.
pk
OLTP vs. OLAP (cont.)
fk
pk
• Most relational databases which are designed to handle a high number of
reads and writes (updates and retrievals of information) are referred to as
________ (OnLine Transaction Processing) systems.
• OLTP systems are very efficient for high volume activities such as
cashiering, where many items are being recorded via bar code scanners in
a very short period of time.
• However, using OLTP databases for analysis is generally not very
efficient, because in order to retrieve data from multiple tables at the
same time, a query containing ________ must be used.
OLTP vs. OLAP (cont.)
• In order to keep our transactional databases running quickly and smoothly,
we may wish to create a data warehouse. A data warehouse is a type of large
database (including both current and historical data) that has been
_____________ and archived.
• Denormalization is the process of intentionally combining some tables into a
single table in spite of the fact that this may introduce duplicate data in
some columns.
Fig. Extra-b: A combination of the tables into a single dataset.
• The figure depicts what our simple example data might look like if it were
in a data warehouse. When we design databases in this way, we reduce the
number of joins necessary to query related data, thereby speeding up the
process of analyzing our data.
• Databases designed in this manner are called __________ (OnLine
Analytical Processing) systems.
Merging Relations
(View Integration)
• In a project development process, there may
be a number of separate E-R diagrams and
user views created and some of them may
be redundant.
• Therefore, some relations should be merged
to remove the redundancy.
TM 4-102
Merging Relations
(View Integration - An example)
No_Years)
Jobcode, No_Years)
TM 4-103
Merging Relations
(Problems on View Integration)
Issues to watch out for when merging entities
from different ER models:
 Synonyms: Different names, same meaning.
 Homonyms: Same name, different meanings.
 Transitive Dependencies:
 dependencies–even if relations are in 3NF prior to
merging, they may not be after merging
 Supertype/Subtype
 May be hidden prior to merging
TM 4-104
Problems on View Integration
• Synonyms: Different names, same meaning.
STUDENT1(StudentID, Name)
• Homonyms: Same name, different meanings.
STUDENT(StudentID,Name, Phone_No,
TM 4-105
Problems on View Integration
• Transitive Dependencies
STUDENT1(StudentID, Major)
the result is ...
??NF
and after removing transitive dependency
STUDENT_Major(StudentID, Major)
TM 4-106
Problems on View Integration
• Supertype/Subtype