Report

Chapter 3 Notes 3.1 Functional Dependencies • A functional dependency is a statement that – two tuples of a relation that agree on some particular set of attributes must also agree on some other particular set of attributes. Functional Dependencies continued • Functional dependencies (FDs) are used to specify formal measures of the "goodness" of relational designs • FDs and keys are used to define normal forms for relations • FDs are constraints that are derived from the meaning and interrelationships of the data attributes • A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y Examples of FD constraints • social security number determines employee name SSN -> ENAME • project number determines project name and location PNUMBER -> {PNAME, PLOCATION} • employee ssn and project number determines the hours per week that the employee works on the project {SSN, PNUMBER} -> HOURS Keys of a Relation • A set of one or more attributes is a key of a relation if – Those attributes functionally determine all other attributes of the relation. – No proper subset of the prospective key functionally determine all other attributes of the relation • A superkey for a relation is a set of attributes that functionally determines all the attributes of the relation . • A key is a superkey, no proper subset of which is also a superkey 3.2 Reasoning About Functional Dependencies • There are many rules that let us infer that one FD X → A holds in any relation instance that satisfies some other given set of FD's . • To verify that X → A holds , compute the closure of X , using the given FD's to expand X until it includes A . Armstrong’s inference rules IR1. (Reflexive) If Y subset-of X, then X -> Y IR2. (Augmentation) If X -> Y, then XZ -> YZ (Notation: XZ stands for X U Z) IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z IR1, IR2, IR3 form a sound and complete set of inference rules More inference rules (Decomposition) If X -> YZ, then X -> Y and X -> Z (Union) If X -> Y and X -> Z, then X -> YZ (Psuedotransitivity) If X -> Y and WY -> Z, then WX -> Z The last three inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property) One more inference • Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F • Closure of a set of attributes X with respect to F is the set X + of all attributes that are functionally determined by X • X + can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F • Closure algorithm on page 76 Equivalence of Sets of FDs • Two sets of FDs F and G are equivalent if: – every FD in F can be inferred from G, and – every FD in G can be inferred from F • Hence, F and G are equivalent if F + =G + Definition: F covers G if every FD in G can be inferred from F (i.e., if G + subset-of F +) • F and G are equivalent if F covers G and G covers F • There is an algorithm for checking equivalence of sets of FDs 3.2.7 Minimal Basis for a set of FD's • For any set of FD 's, there is at least one minimal basis, – which is a set of FD's equivalent to the original (each set implies the other set), with singleton right sides , – no FD that can be eliminated while preserving equivalence, and – no attribute in a left side that can be eliminated while preserving equivalence. 3.3 Design of Relational Database Schemas • Problems that arise when our schema is poorly designed – Redundancy – Update anomalies – Deletion anomalies • The accepted way to eliminate anomalies is to decompose relations. – Split attributes of a relation to make the schemas of two new relations Anomalies (1) Consider the relation: EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours) Update Anomaly: • Changing the name of project number P1 from “Billing” to “Customer-Accounting” may cause this update to be made for all 100 employees working on project P1. Anomalies (2) EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours) • Insert Anomaly: Cannot insert a project unless an employee is assigned to it. – Inversely, cannot insert an employee unless an he/she is assigned to a project. • Delete Anomaly: When a project is deleted, it will result in deleting all the employees who work on that project. – Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project Normalization of Relations (1) • Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations • Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form Normalization of Relations (2) • 2NF, 3NF, BCNF based on keys and FDs of a relation schema • 4NF based on keys, multi-valued dependencies : MVDs; • 5NF based on keys, join dependencies : JDs • Additional properties may be needed to ensure a good relational design (lossless join, dependency preservation) Boyce-Codd Normal Form • A relation is in BCNF if the only nontrivial FD's say that some superkey functionally determines one or more of the other attributes. • A major benefit of BCNF is that it eliminates redundancy caused by the existence of FD's. Decomposition into BCNF (1) • By repeatedly choosing suitable decompositions, we can break any relation into a collection of subsets of its attributes – These subsets are the schemas of relations in BCNF – The data in the original relation is represented faithfully by the data in the decomposition relations Decomposition into BCNF (2) • Two FDs exist in the relation TEACH: fd1: { student, course} -> instructor fd2: instructor -> course • {student, course} is a candidate key for this relation and that the dependencies shown follow a pattern. So this relation is in 3NF but not in BCNF • A relation NOT in BCNF should be decomposed so as to meet this property, while possibly forgoing the preservation of all functional dependencies in the decomposed relations. Decomposition into BCNF (3) Three possible decompositions for relation TEACH 1. {student, instructor} and {student, course} 2. {course, instructor } and {course, student} 3. {instructor, course } and {instructor, student} • • • All three decompositions will lose fd1. We have to settle for sacrificing the functional dependency preservation. But we cannot sacrifice the non-additivity property after decomposition. Out of the above three, only the 3rd decomposition will not generate spurious tuples after join.(and hence has the nonadditivity property). There is a test to determine whether a binary decomposition (decomposition into two relations) is nonadditive (lossless) 3.4 Decomposition: The Good, Bad, and Ugly • Distinct properties a decomposition should have – Elimination of anomalies by decomposition – Recoverability of information, original relation from tuples in its decomposition – Preservation of dependencies • If we check the projected FD’s in the decomposition relations, can we be sure • When we reconstruct the original relation from the decomposition by joining, the result will satisfy the original FD’s Lossless-Join Decomposition • A useful property of a decomposition is that the original relation can be recovered exactly by taking the natural join of the relations in the decomposition. • Any decomposition gives us back at least the tuples with which we start, but – a carelessly chosen decomposition can give tuples in the join that were not in the original relation The Chase • We can test whether a decomposition has the lossless - join property by setting up a tableau – – a set of rows that represent tuples of the original relation. • We chase a tableau by applying the given functional dependencies to infer that certain pairs of symbols must be the same. • The decomposition is lossless with respect to a given set of FD' s if and only if – the chase leads to a row identical to the tuple whose membership in the join of the projected relations we assumed. Dependency-Preserving Decomposition • Another desirable property of a decomposition is that we can check all the functional dependencies that hold in the original relation by checking FD's in the decomposed relations. Third Normal Form and BCNF • Sometimes decomposition into BCNF can lose the dependency - preservation property. • 3NF can be thought of as a relaxed form of BCNF – allows an FD X → A even if X is not a superkey , provided A is a member of some key . • 3NF does not guarantee to eliminate all redundancy due to FD's , but often does so. 3.5 Third Normal form in the text • A relation is in third normal form (3NF) if – Whenever A1A2….An → B1B2….Bm is a nontrivial FD, – either { A1A2….An} is a superkey, or – those of B1B2….Bm that are not among the A’s are each of member of some key (not necessarily the same key) Synthesis Algorithm for 3NF • • • • If we take a minimal basis for a given set of FD's, turn each of these FD's into a relation, and add a key for the relation, if necessary, the result is a decomposition into 3NF that has the lossless-join and dependency-preservation properties. 3.6 Multivalued Dependencies • A multivalued dependency is a statement that two sets of attributes in a relation have sets of values that appear in all possible combinations. • A multivalued dependency is a statement about some relation R that when you fix the values for one set of attributes, the the values in certain other attributes are independent of the values of all other attributes in the relation Multivalued Depenency (MVD) intro 1 of 2 • In many cases relations have constraints that cannot be specified as functional dependencies. • Multivalued dependencies are a consequence of the first normal form (1NF) which disallows an attribute in a tuple to have a set of values. MVD intro 2 of 2 • If we have two or more multivalued independent attributes in the same relation schema, • we get into a problem of having to repeat every value of one of the attributes with every value of the other attribute – to keep the relation state consistent and – to maintain the independence among the attributes involved. • This constraint is specified by a multivalued dependency Fourth Normal Form • MVD's can also cause redundancy in a relation. • 4NF is like BCNF, but also forbids nontrivial MVD's whose left side is not a superkey. • It is possible to decompose a relation into 4NF without losing information. • 4NF implies BCNF implies 3NF 3.7 An Algorithm for Discovering MVD’s Chase-based test for whether X → Y follows from F can be summarized as: 1. Start with a tableau having tow rows that agree on only X. 2. Chase the tableau using the FD’s of F 3. If the final tableau agrees in all columns of Y, then X → Y holds; otherwise it does not Reasoning About MVD's • We can infer MVD's and FD's from a given set of MVD's and FD's by a chase process. – We start with a two-row tableau that represent the dependency we are trying to prove. – FD's are applied by equating symbols , and MVD's are applied by adding rows to the tableau that have the appropriate components interchanged .