chapters 3 4 5 6 7 & transaction protocols

Report
SQL
What is SQL?
• DML
• DDL
• Partially declarative
• Based on the algebra via the tuple calculus, and therefore
its core has an elegant set-theoretic foundations
• Provides a sound foundation for mathematically precise
optimization
• Has very simple structure; optimized for moving large
numbers of tuples at once
• Set-based, retrieval-based
What are relational databases?
Tables and SQL, a programming language engineered for
high volume data applications
“must always be correct” data management apps
transaction-based applications
non-network, non-object based data
Chapter 3: The syntax of SQL
SELECT select_list
[FROM table_source]
[WHERE search_condition]
[ORDER BY order_by_list]
[LIMIT row_limit]
Details of the Select clause
SELECT [ALL|DISTINCT]
column_specification [[AS] result_column]
[, column_specification [[AS] result_column]]
 All columns in a base table
 Column name in a base table
 Calculation
 Function
Details of Select, continued
• Select * means all columns (attributes)
• Using arithmetic
• SELECT invoice_total - payment_total – credit_total
AS balance_due
• Using a function
• SELECT CONCAT(first_name, ' ', last_name) AS full_name
• Renaming a column
• SELECT invoice_number AS "Invoice Number",
invoice_date AS Date, invoice_total AS Total
FROM invoices
Operator precedence
Operator
*
/
DIV
% (MOD)
+
-
Name
Order of precedence
Multiplication
1
Division
1
Integer division
1
Modulo (remainder)
1
Addition
2
Subtraction
2
Calculations & concatenation & strings
SELECT invoice_total, payment_total, credit_total,
invoice_total - payment_total - credit_total
AS balance_due
FROM invoices
SELECT vendor_city, vendor_state, CONCAT(vendor_city, vendor_state)
FROM vendors
SELECT vendor_contact_first_name,vendor_contact_last_name
CONCAT(LEFT(vendor_contact_first_name, 1),
LEFT(vendor_contact_last_name, 1)) AS initials
FROM vendors
Date function, round function
SELECT invoice_date,
DATE_FORMAT(invoice_date, '%m/%d/%y') AS 'MM/DD/YY',
DATE_FORMAT(invoice_date, '%e-%b-%Y') AS 'DD-Mon-YYYY'
FROM invoices
SELECT invoice_date, invoice_total,
ROUND(invoice_total) AS nearest_dollar,
ROUND(invoice_total, 1) AS nearest_dime
FROM invoices
Where clause format
WHERE [NOT] search_condition_1 {AND|OR}
[NOT] search_condition_2 ...
WHERE NOT (invoice_total >= 5000
OR NOT invoice_date <= '2011-08-01')
WHERE invoice_total < 5000
AND invoice_date <= '2011-08-01'
In phrase and nested selects
WHERE test_expression [NOT] IN
({subquery|expression_1 [, expression_2]...})
WHERE vendor_id IN
(SELECT vendor_id
FROM invoices
WHERE invoice_date = '2011-07-18')
Like clause
WHERE vendor_city LIKE 'SAN%'
Cities that will be retrieved
“San Diego”, “Santa Ana”
WHERE vendor_name LIKE 'COMPU_ER%'
Vendors that will be retrieved
“Compuserve”, “Computerworld”
Order By clause
SELECT vendor_name,
CONCAT(vendor_city, ', ', vendor_state, ' ',
vendor_zip_code) AS address
FROM vendors
ORDER BY vendor_name
Chapter 4: Manipulating multiple tables
• Join
• Equijoin
• Natural join
• Outer join
• Self join
• N-way join
Explicit versus implicit joins
SELECT vendor_name, invoice_number, invoice_date,
line_item_amount, account_description
FROM vendors v
JOIN invoices i
ON v.vendor_id = i.vendor_id
JOIN invoice_line_items li
ON i.invoice_id = li.invoice_id
JOIN general_ledger_accounts gl
ON li.account_number = gl.account_number
WHERE invoice_total - payment_total - credit_total > 0
ORDER BY vendor_name, line_item_amount DESC
SELECT invoice_number, vendor_name
FROM vendors v, invoices i
WHERE v.vendor_id = i.vendor_id
ORDER BY invoice_number
Left outer join and natural join
SELECT vendor_name, invoice_number, invoice_total
FROM vendors LEFT JOIN invoices
ON vendors.vendor_id = invoices.vendor_id
ORDER BY vendor_name
SELECT invoice_number, vendor_name
FROM vendors
NATURAL JOIN invoices
ORDER BY invoice_number
Union operator
SELECT 'Active' AS source, invoice_number,
invoice_date, invoice_total
FROM active_invoices
WHERE invoice_date >= '2011-06-01'
UNION
SELECT 'Paid' AS source, invoice_number,
invoice_date, invoice_total
FROM paid_invoices
WHERE invoice_date >= '2011-06-01'
ORDER BY invoice_total DESC
Chapter 5: Aggregates in queries
AVG([ALL|DISTINCT] expression)
SUM([ALL|DISTINCT] expression)
MIN([ALL|DISTINCT] expression)
MAX([ALL|DISTINCT] expression)
COUNT([ALL|DISTINCT] expression)
COUNT(*) – counts nulls
Issues with aggregate data
• This is when we intentionally create a new form of “object”
and shift to ones that are deliberately not related to
individual tuples
• It’s selections and projections that often unintentionally
lose a perspective on object identity
Aggregate examples
SELECT COUNT(*) AS number_of_invoices,
SUM(invoice_total – payment_total – credit_total)
AS total_due
FROM invoices
WHERE invoice_total – payment_total – credit_total > 0
SELECT 'After 1/1/2011' AS selection_date,
COUNT(*) AS number_of_invoices, ROUND(AVG(invoice_total), 2) AS avg_invoice_amt,
SUM(invoice_total) AS total_invoice_amt
FROM invoices
WHERE invoice_date > '2011-01-01‘
SELECT 'After 1/1/2011' AS selection_date,
COUNT(*) AS number_of_invoices,
MAX(invoice_total) AS highest_invoice_total,
MIN(invoice_total) AS lowest_invoice_total
FROM invoices
WHERE invoice_date > '2011-01-01'
Aggregate examples, continued
SELECT MIN(vendor_name) AS first_vendor,
MAX(vendor_name) AS last_vendor,
COUNT(vendor_name) AS number_of_vendors
FROM vendors
SELECT COUNT(DISTINCT vendor_id) AS number_of_vendors,
COUNT(vendor_id) AS number_of_invoices,
ROUND(AVG(invoice_total), 2) AS avg_invoice_amt,
SUM(invoice_total) AS total_invoice_amt
FROM invoices
WHERE invoice_date > '2011-01-01'
Having clause (with group by)
SELECT select_list
FROM table_source
[WHERE search_condition]
[GROUP BY group_by_list]
[HAVING search_condition]
[ORDER BY order_by_list]
Group by and Having: complexities
• Group by: how to vertically group tuples
• Having: which groups will be included in the final result
• Note: All of this is after the Where clause is applied
• Group by: based on columns or expressions that have
columns in them
• If there are any calculations done in the Select clause,
this happens after the Group by clause; i.e., it is
performed for each group that results from the Group by
• Group by can be nested if you specify more than one
column
• Order by operators are performed after the Having
Having example, group by example
SELECT vendor_id, ROUND(AVG(invoice_total), 2)
AS average_invoice_amount
FROM invoices
GROUP BY vendor_id
HAVING AVG(invoice_total) > 2000
ORDER BY average_invoice_amount DESC
(Note: 2 is the number of decimals in the result.)
SELECT vendor_name,
COUNT(*) AS invoice_qty,
ROUND(AVG(invoice_total),2) AS invoice_avg
FROM vendors JOIN invoices
ON vendors.vendor_id = invoices.vendor_id
WHERE invoice_total > 500
GROUP BY vendor_name
ORDER BY invoice_qty DESC
These are the same…
SELECT invoice_date,
COUNT(*) AS invoice_qty,
SUM(invoice_total) AS invoice_sum
FROM invoices
GROUP BY invoice_date
HAVING invoice_date BETWEEN '2011-05-01' AND '2011-05-31'
AND COUNT(*) > 1
AND SUM(invoice_total) > 100
ORDER BY invoice_date DESC
SELECT invoice_date, COUNT(*) AS invoice_qty,
SUM(invoice_total) AS invoice_sum
FROM invoices
WHERE invoice_date BETWEEN '2011-05-01' AND '2011-05-31'
GROUP BY invoice_date
HAVING COUNT(*) > 1 AND SUM(invoice_total) > 100
ORDER BY invoice_date DESC
We will shift our focus a bit…
• We’ll run the actually queries that appear in chapters 5
onward
• The slides will contain overview material and not the
actual queries
Chapter 6: subqueries
• Often used to pass an aggregate value to a parent query
• Often a good way to book-keep what might have been a
very complex WHERE clause, with perhaps a multiway
join
• A good way to make a query look more readable to
someone who uses it later
• Subqueries can also be reused in other queries
• Note: you cannot use the SELECT attributes from the
embedded query in the parent query unless it directly
references the appropriate table(s) in the outer FROM
clause
Subqueries, continued
• Important tool: IN operator, which is a set “element of”
•
•
•
•
operator (written with an epsilon), and you can negate the
IN (not IN)
Often you use SOME or ALL or ANY, if you want to return
multiple values (i.e., multiple tuples, perhaps with only
one attribute each)
The default if you don’t use one of the operators above is
to return a single value
ALL is a set “for all elements of” operator (written with an
upside down A)
ANY is a set “there exists” operator (written with a
backward E)
Subqueries, continued
• The default is that a subquery only executes once, but
you can use a “correlated” query so that it will run once for
each row processed by the parent query.
• This breaks the “execute from the inside out” paradigm on an
uncorrelated subquery.
• EXISTS is often used with correlated queries
• You can put a subquery in a HAVING, FROM, or SELECT
clause as well
• But such queries get messy and we will skip this for now.
• It’s a good idea to write and test subqueries independently
whenever possible, unless they are trivial
Examples from chapter 6
• 7: passing a single value
• 6: ANY
• 3: IN
• 8: NOT EXISTS
• 5: ALL
Chapter 7: changing the database state
• This is when we need a transaction protocol
• Updates must never overlap with each other or with read-
only queries
• Read-only queries can overlap
• But we want to increase through put by supporting as
much “concurrency” as possible
• Each transaction has the potential to update the DB state
2 Phase Transactions
• Each SQL program is within a begin and end transaction pair
• Each transaction has its own workspace for DB items it is going
•
•
•
•
•
•
to update
Any transactions that overlap in execution time will appear to
have run in some serial order
This is done by transactions requesting read and write locks
(also known as shared and exclusive locks)
Read locks can be shared with other readers
Write locks cannot be shared with readers or writers
All locks held until the end of the transaction
They are released and then the changes that a transaction has
made are moved to the DB
Serializability of transactions
• The net effect is that the transactions that overlap in
execution time appear to have run in some serial order
• Transactions can be undone by throwing away the local
store (conceptually, at least)
• The write period at the end of the transaction must be
atomic
• The two phases:
• Request read, write, upgrade locks (and wait on locks) and process
• Release locks and move updates to the DB
• There is a notion of “serializability” which means that the
actual schedule of executed steps corresponds to some
serial order of running the transactions
Interesting facts on transactions…
• Various legal schedules might produce different results
• A crash during phase two can lead the database
•
•
•
•
inconsistent
The transaction manager uses a lot of overhead
resources handling locks
We still need to be able to roll the database back and
rerun transaction logs
User must control the nature of overlapping transactions
or there might be very little true concurrency
In a distributed database, the lock manager is a
bottleneck because all components of the database must
move in lockstep
Updating data: changing the DB state
INSERT INTO invoices VALUES
(115, 97, '456789', '2011-08-01', 8344.50, 0, 0, 1, '2011-08-31', NULL);
INSERT INTO invoices VALUES
(116, 97, '456701', '2011-08-02', 270.50, 0, 0, 1, '2011-09-01', NULL),
(117, 97, '456791', '2011-08-03', 4390.00, 0, 0, 1, '2011-09-02', NULL),
(118, 97, '456792', '2011-08-03', 565.60, 0, 0, 1, '2011-09-02', NULL);
Updating tables, continued
USE ex;
INSERT INTO color_sample (color_number)
VALUES (606);
INSERT INTO color_sample (color_name)
VALUES ('Yellow');
INSERT INTO color_sample
VALUES (DEFAULT, DEFAULT, 'Orange');
INSERT INTO color_sample
VALUES (DEFAULT, 808, NULL);
INSERT INTO color_sample
VALUES (DEFAULT, DEFAULT, NULL);
Updating tables, continued
INSERT INTO invoice_archive
SELECT *
FROM invoices
WHERE invoice_total - payment_total - credit_total = 0;
INSERT INTO invoice_archive
(invoice_id, vendor_id, invoice_number, invoice_total, credit_total,
payment_total, terms_id, invoice_date, invoice_due_date)
SELECT
invoice_id, vendor_id, invoice_number, invoice_total, credit_total,
payment_total, terms_id, invoice_date, invoice_due_date
FROM invoices
WHERE invoice_total - payment_total - credit_total = 0;
Updating, continued
UPDATE invoices
SET payment_date = '2011-09-21',
payment_total = 19351.18
WHERE invoice_number = '97/522';
UPDATE invoices
SET terms_id = 1
WHERE vendor_id = 95;
UPDATE invoices
SET credit_total = credit_total + 100
WHERE invoice_number = '97/522';
Updating, continued
DELETE FROM general_ledger_accounts
WHERE account_number = 306;
DELETE FROM invoice_line_items
WHERE invoice_id = 78 AND invoice_sequence = 2;
DELETE FROM invoice_line_items
WHERE invoice_id = 12;
DELETE FROM invoice_line_items
WHERE invoice_id IN
(SELECT invoice_id
FROM invoices
WHERE vendor_id = 115);

similar documents