Report

Basic Concepts of Relational Databases R1 sid Example Instances A table or relation stores records (or tuples) that have the same attributes 22 58 S1 sid Order of records is not important conceptually 22 31 58 “Sailors” and “Reserves” relations for our examples. Questions: Can we have two records with the same sid in the same sailor (e.g., S1 or S2) table? S2 sid 28 31 44 58 Can the same sailor reserve the same boat many times? 2 bid day 101 10/10/96 103 11/12/96 sname rating age dustin 7 45.0 lubber 8 55.5 rusty 10 35.0 sname rating age yuppy 9 35.0 lubber 8 55.5 guppy 5 35.0 rusty 10 35.0 Relational Algebra Basic operations: Selection (cR ) Selects the subset of rows that satisfy a condition c from R. Projection (listR ) Keeps only the attributes in list from R. Set-difference ( R1 – R2) Finds tuples in R1, but not in R2. Union (R1 R2) Finds tuples that belong to R1 or R2. Cross-product (R1 x R2) Allows us to combine two relations. Renaming (p Rnew Rold) Allows us to rename a relation Rold to Rnew. Additional operations: Intersection, join, division: Not essential, but (very!) useful. Since each operation returns a relation, operations can be composed! (Algebra is “closed”.) 3 Projection Deletes attributes that are not in projection list. Schema of result contains exactly the fields in the projection list, with the same names that they had in the (only) input relation. The formal projection operator has to eliminate duplicates! Note: real systems typically don’t do duplicate elimination unless the user explicitly asks for it. sname rating yuppy lubber guppy rusty 9 8 5 10 sname,rating(S2) age 35.0 55.5 age(S2) 4 Selection Selects rows that satisfy selection condition. sid sname rating age 28 yuppy 9 35.0 58 rusty 10 35.0 rating 8(S2) No duplicates in result! (Why?) Schema of result identical to schema of (only) input relation. Result relation can be the input for another relational algebra operation! (Operator composition.) sname rating yuppy 9 rusty 10 sname,rating( rating 8(S2)) 5 Union, Intersection, Set-Difference sid sname rating age All of these operations take two input relations, which must be unioncompatible: •Same number of fields. •Corresponding fields have the same type. 22 31 58 44 28 dustin lubber rusty guppy yuppy 7 8 10 5 9 45.0 55.5 35.0 35.0 35.0 S1 S2 sid sname 22 dustin sid sname 31 lubber 58 rusty rating age 7 45.0 rating age 8 55.5 10 35.0 S1 S2 S1 S2 6 Cross (Cartesian)-Product, X Each row of S is paired with each row of R. Result schema has one field per field of S and R, with field names `inherited’ if possible. Conflict: Both S and R have a field called sid. (sid) sname rating age (sid) bid day 22 dustin 7 45.0 22 101 10/10/96 22 dustin 7 45.0 58 103 11/12/96 31 lubber 8 55.5 22 101 10/10/96 31 lubber 8 55.5 58 103 11/12/96 58 rusty 10 35.0 22 101 10/10/96 58 rusty 10 35.0 58 103 11/12/96 7 Joins Condition Join: R c S c ( R S) (sid) sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 S (sid) bid 58 103 58 103 S.sid R.sid day 11/12/96 11/12/96 R Result schema same as that of cross-product. Fewer tuples than cross-product, might be able to compute more efficiently Sometimes called a theta-join. 8 Joins Equi-Join: A special case of condition join where the condition c contains only equalities. S sid 22 58 sname dustin rusty sid R rating age 7 45.0 10 35.0 bid 101 103 day 10/10/96 11/12/96 Result schema similar to cross-product, but only one copy of fields for which equality is specified. Natural Join: Equijoin on all common fields. 9 Basic SQL Query SELECT FROM WHERE [DISTINCT] target-list relation-list qualification target-list A list of attributes of relations in relation-list relation-list A list of relation names (possibly with a range-variable after each name). qualification Comparisons (Attr op const or Attr1 op Attr2, where op is one of <,>.=,,,) combined using AND, OR and NOT. DISTINCT is an optional keyword indicating that the answer should not contain duplicates. Default is that duplicates are not eliminated! 10 Conceptual Evaluation Strategy Semantics of an SQL query defined in terms of the following conceptual evaluation strategy: Compute the cross-product of relation-list. Discard resulting tuples if they fail qualifications. Delete attributes that are not in target-list. If DISTINCT is specified, eliminate duplicate rows. This strategy is probably the least efficient way to compute a query! An optimizer will find more efficient strategies to compute the same answers. 11 Example of Conceptual Evaluation SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND R.bid=103 (sid) sname rating age (sid) bid day 22 dustin 7 45.0 22 101 10/10/96 22 dustin 7 45.0 58 103 11/12/96 31 lubber 8 55.5 22 101 10/10/96 31 lubber 8 55.5 58 103 11/12/96 58 rusty 10 35.0 22 101 10/10/96 58 rusty 10 35.0 58 103 11/12/96 12 A Note on Range Variables Really needed only if the same relation appears twice in the FROM clause. The previous query can also be written as: SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND bid=103 OR SELECT sname FROM Sailors, Reserves WHERE Sailors.sid=Reserves.sid AND bid=103 13 Set Operations - Union Find sid’s of sailors who’ve reserved a red or a green boat UNION: Can be used to compute the union of any two union-compatible sets of tuples (which are themselves the result of SQL queries). If we replace OR by AND in the first version, what do we get? Do we need the Sailors table? Also available: EXCEPT (corresponds to the set difference operation of relational algebra) What do we get if we replace UNION by EXCEPT? SELECT S.sid FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND (B.color=‘red’ OR B.color=‘green’) SELECT S.sid FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’ UNION SELECT S.sid FROM Sailors S, Boats B, Reserves R WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘green’ 14 Aggregate Operators Significant extension of relational algebra. SELECT COUNT (*) FROM Sailors S SELECT AVG (S.age) FROM Sailors S WHERE S.rating=10 COUNT (*) COUNT ( [DISTINCT] A) SUM ( [DISTINCT] A) AVG ( [DISTINCT] A) MAX (A) MIN (A) single column SELECT S.sname FROM Sailors S WHERE S.rating= (SELECT MAX(S2.rating) FROM Sailors S2) SELECT COUNT (DISTINCT S.rating) FROM Sailors S WHERE S.sname=‘Bob’ 15 SELECT AVG ( DISTINCT S.age) FROM Sailors S WHERE S.rating=10 GROUP BY and HAVING So far, we’ve applied aggregate operators to all (qualifying) tuples. Sometimes, we want to apply them to each of several groups of tuples. Consider: Find the age of the youngest sailor for each rating level. In general, we don’t know how many rating levels exist, and what the rating values for these levels are! Suppose we know that rating values go from 1 to 10; we can write 10 queries that look like this (!): SELECT MIN (S.age) FROM Sailors S WHERE S.rating = i For i = 1, 2, ... , 10: 16 Queries With GROUP BY and HAVING SELECT FROM WHERE GROUP BY HAVING [DISTINCT] target-list relation-list qualification grouping-list group-qualification The target-list contains (i) an attribute list (ii) terms with aggregate operations (e.g., MIN (S.age)). The attribute list (i) must be a subset of grouping-list. Intuitively, each answer tuple corresponds to a group, and these attributes must have a single value per group. (A group is a set of tuples that have the same value for all attributes in grouping-list.) 17 For each red boat, find the number of reservations for this boat SELECT B.bid, COUNT (*) AS scount FROM Boats B, Reserves R WHERE R.bid=B.bid AND B.color=‘red’ GROUP BY B.bid Grouping over a join of two relations. We cannot remove B.color=‘red’ from the WHERE clause and add a HAVING clause with this condition. Only columns that appear in the Group-By can appear in HAVING, unless they are arguments of an aggregate operator in HAVING. 18 Basic Concepts of Indexing Indexing mechanisms used to speed up access to desired data. E.g., author catalog in library Search Key - attribute to set of attributes used to look up records in a file. An index file consists of records (called index entries) of the form search-key pointer Index files are typically much smaller than the original file Two basic kinds of indices: Ordered indices: search keys are stored in sorted order Hash indices: search keys are distributed uniformly across “buckets” using a “hash function”. 19 Ordered Indices In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library. Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file. Also called clustering index The search key of a primary index is usually but not necessarily the primary key Secondary index: an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index. Index-sequential file: ordered sequential file with a primary index (also called ISAM - indexed sequential access method). 20 Dense Index Files Dense index — Index record appears for every search-key value in the file (in this example we have clustering index). 21 Sparse Index Files Sparse Index: contains index records for only some search-key values. Applicable when records are sequentially ordered on search-key (i.e., only applicable to clustering indexes) To locate a record with search-key value K we: Find index record with largest search-key value <= K Search file sequentially starting at the record to which the index record points Less space and less maintenance overhead for insertions and deletions. Generally slower than dense index for locating records. Good tradeoff: sparse index with an index entry for every block in file, corresponding to least search-key value in the block. 22 Example of Sparse Index Files 23 Multilevel Index If primary index does not fit in memory, access becomes expensive. To reduce number of disk accesses to index records, treat primary index kept on disk as a sequential file and construct a sparse index on it. outer index – a sparse index of primary index inner index – the primary index file If even outer index is too large to fit in main memory, yet another level of index can be created, and so on. Indices at all levels must be updated on insertion or deletion from the file. 24 Multilevel Index (Cont.) 25 B+-Tree Index A B+-tree is a rooted tree satisfying the following properties: All paths from root to leaf are of the same length (i.e., balanced tree) Each node has between n/2 and n pointers. Each leaf node stores between (n–1)/2 and n–1 values. n is called fanout (it corresponds to the maximum number of pointers/children). The value (n-1)/2 is called order (it corresponds to the minimum number of values). Special cases: If the root is not a leaf, it has at least 2 children. If the root is a leaf (that is, there are no other nodes in the tree), it can have between 0 and (n–1) values. 26 Example of clustering (primary) B+-tree on candidate key 41 45 51 11 21 30 1 3 11 13 15 21 23 30 33 41 43 45 47 51 53 FILE W ITH REC O RDS record with search key 1 record with search key 3 This example corresponds to dense B+-tree index: Every search key value appears in a leaf node You may also have sparse B+-tree, e.g., entries in leaf nodes correspond to pages 27 Example of non-clustering (secondary) B+tree on candidate key 41 45 51 11 21 30 1 3 11 13 15 21 23 30 33 41 43 45 47 51 53 FILE W ITH REC O RDS record wit h record wit h search key 11 search key 3 record wit h search key 1 Should be alw ays dense 28 Example of clustering B+-tree on noncandidate key 41 45 51 11 21 30 1 3 11 13 15 21 23 30 33 41 43 45 47 FILE W ITH REC O RDS records wit h search key 1 record with search key 3 29 51 53 Example of non-clustering B+-tree on noncandidate key 41 45 51 11 21 30 1 3 11 13 15 point ers to records wit h search key 1 21 23 30 33 41 43 45 47 point ers to records wit h search key 3 FILE W ITH REC O RDS 30 51 53 Inserting a Data Entry into a B+ Tree Find correct leaf L. Put data entry onto L. If L has enough space, done! Else, must split L (into L and a new node L2) Redistribute entries evenly, copy up middle key. Insert index entry pointing to L2 into parent of L. This can happen recursively To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.) Splits “grow” tree; root split increases height. Tree growth: gets wider or one level taller at top. 31 Deleting a Data Entry from a B+ Tree Start at root, find leaf L where entry belongs. Remove the entry. If L is at least half-full, done! If L less that half-full, Try to re-distribute, borrowing from sibling (adjacent node to the right). If re-distribution fails, merge L and sibling. If merge occurred, must delete entry (pointing to L or sibling) from parent of L. Merge could propagate to root, decreasing height. 32 B+-tree Update Examples Consider the B+-tree below with order 2 (each node except for the root must contain at least two search key values – and 3 pointers). Show the tree that would result after successively applying each of the following operations. 41 45 51 11 21 30 1 3 11 13 15 21 23 30 33 41 43 45 47 Remove 1 51 53 Remove 41 41 45 51 13 21 30 3 11 13 15 21 23 30 33 41 43 33 45 47 51 53 B+-tree Update Examples (cont) After removing 41 30 43 51 13 21 3 11 13 15 21 23 30 33 43 45 47 Remove 3 51 53 Insert 41 21 30 43 51 11 13 15 21 23 30 33 43 45 47 34 51 53 B+-tree Update Examples (cont) After inserting 41 21 30 43 51 3 11 13 15 21 23 30 33 41 51 53 43 45 47 Insert 1 30 43 51 13 21 1 3 11 13 15 21 23 30 33 41 35 43 45 47 51 53 Bulk Loading of a B+ Tree If we have a large collection of records, and we want to create a B+ tree on some field, doing so by repeatedly inserting records is very slow. Bulk Loading can be done much more efficiently. Initialization: Sort all data entries (using external sorting), insert pointer to first (leaf) page in a new (root) page. Root 3* 4* Sorted pages of data entries; not yet in B+ tree 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 36 38* 41* 44* Bulk Loading (Cont.) Root Index entries for leaf pages always entered into right-most index page just above leaf level. When this fills up, it splits. (Split may go up rightmost path to the root.) 3* 10 20 Data entry pages 6 4* 6* 9* 12 23 35 not yet in B+ tree 10* 11* 12* 13* 20*22* 23* 31* 35* 36* 38*41* 44* Much faster than repeated inserts! Root 20 10 6 3* 4* 6* 9* 12 Data entry pages not yet in B+ tree 35 23 38 10* 11* 12* 13* 20*22* 23* 31* 35* 36* 38*41* 44* 37 Hash Indices Hashing can be used not only for file organization, but also for index-structure creation. A hash index organizes the search keys, with their associated record pointers, into a hash file structure. Strictly speaking, hash indices are always secondary indices if the file itself is organized using hashing, a separate primary hash index on it using the same search-key is unnecessary. 38 Example of Hash Index 39 Hash Functions In the worst case, the hash function maps all search-key values to the same bucket; this makes access time proportional to the number of search-key values in the file. Ideal hash function is random, so each bucket will have the same number of records assigned to it irrespective of the actual distribution of search-key values in the file. Typical hash functions perform computation on the internal binary representation of the search-key. For example, for a string search-key, the binary representations of all the characters in the string could be added and the sum modulo the number of buckets could be returned. 40 External Sorting (disk-resident files) Merging Sorted Files with 3 pages of main memory buffer sorted file 2 merged file (1,…) (2,…) (1,…) (5,…) (4,…) (2,…) (7,…) (6,…) (4,…) (11,…) (9,…) (5,…) (12,…) (10,…) (6,…) (15,…) (14,…) (7,…) (20,…) (17,…) (9,…) (21,…) (18,…) (11,…) sorted file 1 … 41 External Sorting (disk-resident files) sorted file 1 sorted file 2 merged file (1,…) (2,…) (1,…) (5,…) (4,…) (2,…) (7,…) (6,…) (4,…) (11,…) (9,…) (5,…) bring next page of file 1 write page to disk bring next page of file 2 (6,…) (12,…) (10,…) (15,…) (14,…) (9,…) (20,…) (17,…) (10,…) (21,…) (18,…) (11,…) (7,…) 42 write page to disk External Sorting (disk-resident files) Continuing the previous example: Question: I assumed that each file is already sorted. If the file is not sorted, how do I sort it (using only my 3 buffer pages?) Answer: Each file in the example is only 2 pages. Therefore, I can bring the entire file in memory, and sort it using any main-memory algorithm. Question: The previous example assumes two separate files. How do I apply this idea to sort a single file? Answer: You can split the file in two parts and merge them as if they were separate files. Question: Can I do better if I have M>3 main memory pages. Answer: Yes, instead of 2 you can merge up to M-1 files (because you need 1 page for writing the output). 43 Sorting a file that has 108 pages, using only 5 pages of main memory. 1 2 1 2 5 108 PASS 0 5 6 7 sorted run 1 PASS 1 10 ...... sorted run 2 run 1 run 2 run 3 run 4 sorted run 22 run 21 ...... 4x5=20 pages sorted run 6 run 1 run 2 run 3 run 4 run 5 run 6 80 pages 28 pages sorted run 1 PASS 3 run 22 8 pages sorted run 1 PASS 2 106 107 108 sorted run 2 run 1 run 2 108 pages sorted file 44 Basic Steps in Query Processing Parsing and translation translate the query into its internal form. This is then translated into relational algebra. Parser checks syntax, verifies relations Evaluation The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query. 45 Basic Steps in Query Processing : Optimization A relational algebra expression may have many equivalent expressions E.g., balance2500(balance(account)) is equivalent to balance(balance2500(account)) Each relational algebra operation can be evaluated using one of several different algorithms Correspondingly, a relational-algebra expression can be evaluated in many ways. Annotated expression specifying detailed evaluation strategy is called an evaluation-plan. E.g., can use an index on balance to find accounts with balance < 2500, or can perform complete relation scan and discard accounts with balance 2500 46 Selection Operation File scan – search algorithms that locate and retrieve records that fulfill a selection condition. Algorithm A1 (linear search). Scan each file block and test all records to see whether they satisfy the selection condition. Cost estimate (number of disk blocks scanned) = br br denotes number of blocks containing records from relation r If selection is on a key attribute, cost = (br /2) stop on finding record Linear search can be applied regardless of selection condition or ordering of records in the file, or availability of indices 47 Selection Operation (Cont.) A2 (binary search). Applicable if selection is an equality comparison on the attribute on which file is ordered. Assume that the blocks of a relation are stored contiguously Cost estimate (number of disk blocks to be scanned): log2(br) — cost of locating the first tuple by a binary search on the blocks Plus number of blocks containing records that satisfy selection condition 48 Selections Using Indices Index scan – search algorithms that use an index selection condition must be on search-key of index. A3 (primary index on candidate key, equality). Retrieve a single record that satisfies the corresponding equality condition Cost = HTi + 1 (HTi is the height of the tree index) A4 (primary index on nonkey, equality) Retrieve multiple records. Records will be on consecutive blocks Cost = HTi + number of blocks containing retrieved records A5 (equality on search-key of secondary index). Retrieve a single record if the search-key is a candidate key Cost = HTi + 1 (if hash index HTi = 1 or HTi = 1.2 if we assume that there exist overflow buckets) Retrieve multiple records if search-key is not a candidate key Cost = HTi + number of records retrieved Can be very expensive! each record may be on a different block one block access for each retrieved record 49 Selections Involving Comparisons Can implement selections of the form AV (r) or A V(r) by using a linear file scan or binary search, or by using indices in the following ways: A6 (primary index, comparison). (Relation is sorted on A) For A V(r) use index to find first tuple v and scan relation sequentially from there For AV (r) just scan relation sequentially till first tuple > v; do not use index A7 (secondary index, comparison). For A V(r) use index to find first index entry v and scan index sequentially from there, to find pointers to records. For AV (r) just scan leaf pages of index finding pointers to records, till first entry > v In either case, retrieve records that are pointed to requires an I/O for each record Linear file scan may be cheaper if many records are to be fetched! 50 Join Operation Several different algorithms to implement joins Block nested-loop join Indexed nested-loop join Merge-join Hash-join 51 Block Nested-Loop Join Variant of nested-loop join in which every block of inner relation is paired with every block of outer relation. for each block Br of r do begin for each block Bs of s do begin for each tuple tr in Br do begin for each tuple ts in Bs do begin Check if (tr,ts) satisfy the join condition if they do, add tr • ts to the result. end end end end 52 Block Nested-Loop Join (Cont.) Worst case estimate: br bs + br block accesses. Each block in the inner relation s is read once for each block in the outer relation (instead of once for each tuple in the outer relation Best case: br + bs block accesses. Improvements to nested loop and block nested loop algorithms: In block nested-loop, use M — 2 disk blocks as blocking unit for outer relations, where M = memory size in blocks; use remaining two blocks to buffer inner relation and output Cost = br / (M-2) bs + br Optimizations: If equi-join attribute forms a key or inner relation, stop inner loop on first match Scan inner loop forward and backward alternately, to make use of the blocks remaining in buffer (with LRU replacement) Use main-memory hash table for the outer relation (to decrease CPU cost) 53 Indexed Nested-Loop Join Index lookups can replace file scans if join is an equi-join or natural join and an index is available on the inner relation’s join attribute Can construct an index just to compute a join. For each tuple tr in the outer relation r, use the index to look up tuples in s that satisfy the join condition with tuple tr. Cost of the join: br + nr c Where c is the cost of traversing index and fetching all matching s tuples for one tuple or r c can be estimated as cost of a single selection on s using the join condition. If indices are available on join attributes of both r and s, use the relation with fewer tuples as the outer relation. 54 Merge-Join Sort both relations on their join attribute (if not already sorted on the join attributes). Merge the sorted relations to join them Join step is similar to the merge stage of the sort-merge algorithm. Main difference is handling of duplicate values in join attribute — every pair with same value on join attribute must be matched 55 Merge-Join (Cont.) Can be used only for equi-joins and natural joins Each block needs to be read only once (assuming all tuples for any given value of the join attributes fit in memory) Thus number of block accesses for merge-join is br + bs + the cost of sorting if relations are unsorted. 56 Hash-Join Applicable for equi-joins and natural joins. A hash function h is used to partition tuples of both relations h maps JoinAttrs values to {0, 1, ..., n}, where JoinAttrs denotes the common attributes of r and s used in the natural join. r0, r1, . . ., rn denote partitions of r tuples Each tuple tr r is put in partition ri where i = h(tr [JoinAttrs]). s0,, s1. . ., sn denotes partitions of s tuples Each tuple ts s is put in partition si, where i = h(ts [JoinAttrs]). 57 Hash-Join (Cont.) 58 Hash-Join (Cont.) r tuples in ri need only to be compared with s tuples in si Need not be compared with s tuples in any other partition, since: an r tuple and an s tuple that satisfy the join condition will have the same value for the join attributes. If that value is hashed to some value i, the r tuple has to be in ri and the s tuple in si. 59 Hash-Join Algorithm The hash-join of r and s is computed as follows. 1. Partition the relation s using hashing function h. When partitioning a relation, one block of memory is reserved as the output buffer for each partition. 2. Partition r similarly. 3. For each i: (a) Load si into memory and build an in-memory hash index on it using the join attribute. This hash index uses a different hash function than the earlier one h. (b) Read the tuples in ri from the disk one by one. For each tuple tr locate each matching tuple ts in si using the in-memory hash index. Output the concatenation of their attributes. Relation s is called the build input and r is called the probe input. 60 Hash-Join algorithm (Cont.) The number of partitions n and the hash function h is chosen such that each si should fit in memory. In order for each partition of the build input to fit in memory: n ≥ bs/M Also M ≥ n+1 because for each partition we should have one buffer page (plus one page for input buffer) In order to satisfy these conditions: M > sqrt(bs) The probe relation partitions need not fit in memory Recursive partitioning required if number of partitions n is greater than number of pages M of memory. Rarely required: e.g., recursive partitioning not needed for relations of 1GB or less with memory size of 2MB, with block size of 4KB. 61