Modern Performance 2014 Oslo

Report
Modern Performance - SQL Server
Joe Chang
www.qdpma.com
Jchang6 @ yahoo
About Joe
•
•
•
•
•
SQL Server consultant since 1999
Query Optimizer execution plan cost formulas (2002)
True cost structure of SQL plan operations (2003?)
Database with distribution statistics only, no data 2004
Decoding statblob/stats_stream
– writing your own statistics
• Disk IO cost structure
• Tools for system monitoring, execution plan analysis
See http://www.qdpma.com/
Download: http://www.qdpma.com/ExecStatsZip.html
Blog: http://sqlblog.com/blogs/joe_chang/default.aspx
Overview
• General SQL Server Performance
• Why performance is still important today?
– Brute force?
• Yes, but …
• Special Topics – spectacular fails
• Automating data collections
• SQL Server Engine
– What developers/DBA need to know?
Not in this session
• List of rules to be followed blindly
• without consideration for the underlying reason
• and whether rule actually applies in the
current circumstance
DBA skill: cause and effect analysis & assessment
Common Themes?
• execution plan
– Very large (multiple order of magnitude) error in
row estimate
• Single (execute) of large operation
– Might still be tolerable
• Multiple (executes) of large operations
select
a.Header, a.CUSIP, a.SecNo, a.Security, a.Symbol ,a.Split_rep, a.Sales_Person_Name,cast(sum(a.January) as float) as January
,cast(sum(a.February)
as float) as February ,cast(sum(a.March)
as float) as March
,cast(sum(a.April)
as float) as April
,cast(sum(a.May)
as float) as May
,cast(sum(a.June)
as float) as June
,cast(sum(a.July)
as float) as July
,cast(sum(a.August)
as float) as August
,cast(sum(a.September)
as float) as September
,cast(sum(a.October) as float) as October
,cast(sum(a.November)
as float) as November
,cast(sum(a.December)
as float) as December
,cast(sum(a.Total)
as float) as Total
from(
select
cast(hdr.Header
as varchar(100)) as Header
,cast(AcctSec.CUSIP
as varchar(100)) as CUSIP
,cast(AcctSec.Sec_No
as varchar(100)) as SecNo
,cast(AcctSec.Sec_Desc1 as varchar(100)) as Security
,cast(AcctSec.Symbol
as varchar(100)) as Symbol
,case when RefMonth.[MonthName] = 'January'
then fct.Comm else 0 end as January
,case when RefMonth.[MonthName] = 'February'
then fct.Comm else 0 end as February
,case when RefMonth.[MonthName] = 'March'
then fct.Comm else 0 end as March
,case when RefMonth.[MonthName] = 'April'
then fct.Comm else 0 end as April
,case when RefMonth.[MonthName] = 'May'
then fct.Comm else 0 end as May
,case when RefMonth.[MonthName] = 'June'
then fct.Comm else 0 end as June
,case when RefMonth.[MonthName] = 'July'
then fct.Comm else 0 end as July
,case when RefMonth.[MonthName] = 'August'
then fct.Comm else 0 end as August
,case when RefMonth.[MonthName] = 'September'
then fct.Comm else 0 end as September
,case when RefMonth.[MonthName] = 'October'
then fct.Comm else 0 end as October
,case when RefMonth.[MonthName] = 'November'
then fct.Comm else 0 end as November
,case when RefMonth.[MonthName] = 'December'
then fct.Comm else 0 end as December
,fct.Comm as Total
,AcctEmp.split_rep
,AcctEmp.Sales_Person_Name
from PayoutSystemDW.[dbo].[PS_FactAccountSummary] fct
join PayoutSystemDW.dbo.PS_DimensionRptBus RptBus
on fct.DimRptBusID = RptBus.DimRptBusID
join PayoutSystemDW.dbo.PS_DimensionHeader hdr
on fct.DimHeaderID = hdr.DimHeaderID
join PayoutSystemDW.dbo.PS_DimensionCurrency cur
on fct.DimCurID = cur.DimCurID
and cur.DimCurID = 1
join PayoutSystemDW.dbo.PS_DimensionAcctEmp AcctEmp on fct.DimAcctEmpID = acctemp.DimAcctEmpID and AcctEmp.Empno = 8125
and AcctEmp.Split_rep in ('PB54')
join PayoutSystemDW.dbo.PS_DimensionAcctSec AcctSec
on fct.DimAcctSecID = AcctSec.DimAcctSecID
join PayoutSystemDW.dbo.PS_DimensionRefBuySell bs
on fct.DimRefBuySellID = bs.DimRefBuySellID
join PayoutSystemDW.[dbo].[PS_DimensionAcctOrg] AcctOrg on fct.DimAcctOrgID = AcctOrg.DimAcctOrgID and AcctOrg.OrgCode in ('38C')
join PayoutSystemDW.[dbo].[PS_DimensionAcctClt] as AcctClt on AcctClt.DimAcctCltID = AcctClt.DimAcctCltID
and AcctClt.ClientName = 'BRACY DENNIS M'
join PayoutSystemDW.dbo.PS_DimensionTradeInd ti on ti.DimTradeIndID = fct.DimTradeIndID and ti.[Trade_Ind_Year] = 2014
join PayoutSystemDW.dbo.PS_DimensionRefMonth RefMonth on RefMonth.MonthID = ti.Trade_Ind_Month
where RptBus.ReportID = 1
) a
group by
a.Header, a.CUSIP, a.SecNo, a.Security, a.Symbol,a.Split_rep,a.Sales_Person_Name
select fct.Comm as Total, …
From FactAccountSummary fct
join DimensionAcctClt as AcctClt
on AcctClt.DimAcctCltID = AcctClt.DimAcctCltID
CPU & Memory 2001 versus 2014
C5
C6
C7
C8
C9
PCI-E
14
13
12
11
10
MI
C5
C6
C7
C8
C9
PCI-E
14
13
12
11
10
MI
QPI
QPI
PCI-E
PCI-E
14
13
12
11
10
MI
QPI
QPI
C4
C3
C2 LLC
C1
C0
MI
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
DMI
x4
x4
x4
x4
PCH
PCI-E
GFX
MC
PCI-E
Each core today is more
than 10x over Pentium III
(700MHz?)
PCI-E
DMI 2
Xeon MP 2002-4
PCI-E
QPI
C4
C3
C2 LLC
C1
C0
MI
PCI-E
2001 – 4 sockets, 4 cores
Pentium III Xeon, 900MHz
4-8GB memory?
PCI-E
C5
C6
C7
C8
C9
QPI
PCI-E
QPI
C4
C3
C2 LLC
C1
C0
MI
PCI-E
14
13
12
11
10
MI
PCI-E
C5
C6
C7
C8
C9
PCI-E
MCH
QPI
C4
C3
C2 LLC
C1
C0
MI
PCI-E
FSB
PCI-E
P
PCI-E
P
PCI-E
P
DMI 2
P
L2
Xeon E7 v2 (Ivy Bridge), 15 cores, 3 QPI
4 x 15 = 60 cores
3TB (96 x 32GB) 24 DIMMs per socket
40 PCI-E gen3 lanes + x4 g2 / socket
Mem___2013 __ 2014
16GB __ $191 __ $180
32GB __ $794 __ $650
64GB _____ __ $4510
CPU & Memory 2001 versus 2012
2001 – 4 sockets, 4 cores
Pentium III Xeon, 900MHz
4-8GB memory?
C3
C2
C1
C0
LLC
MI
LLC
MI
C4
C5
C6
C7
MI
C4
C5
C6
C7
QPI
QPI
C3
C2
C1
C0
MI
PCI-E
LLC
MI
C4
C5
C6
C7
MI
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
Each core today is more
than 10x over Pentium III
(700MHz?)
DMI 2
Xeon MP 2002-4
PCI-E
QPI
PCI-E
C3
C2
C1
C0
QPI
QPI
QPI
PCI-E
MI
QPI
PCI-E
MI
PCI-E
LLC
C4
C5
C6
C7
PCI-E
PCI-E
PCI-E
PCI-E
C3
C2
C1
C0
PCI-E
MCH
QPI
PCI-E
FSB
PCI-E
P
PCI-E
P
PCI-E
P
DMI 2
P
L2
Xeon E5 (Sandy Bridge), 8 cores, 2 QPI
4 x 8 = 32 cores total
Westmere-EX 1TB (64x16GB) (3 QPI)
Sandy Bridge E5: 768GB (48 x 16GB) (2 QPI)
Mem___2013 __ 2014
16GB __ $191 __ $180
32GB __ $794 __ $650
64GB _____ __ $4510
Intel E5 & E7 v2 (Ivy-Bridge)
E3 v3
GFX
MC
DMI
x4
x4
x4
x4
PCH
Processor – Core
Microprocessor Pipeline
3GHz
0.33ns clock
1st
BP
IF
ID
RAT
ROB
Sch
Exec
Flags
Retire
2nd
BP
IF
ID
RAT
ROB
Sch
Exec
Flags
Retire
5 ns from start to finish
200MHz
BP
Microprocessor (core) is (multi-lane) assembly line
Each core is superscalar
Processor (socket) has multiple cores
System has multiple sockets
Branch Predict
Instruction Fetch
Decode
Register Allocate &
Rename
Re-Ordering Buffer
Schedule
Execute
Flags
Retire
Micro-architecture Sandy-Bridge
Haswell (Xeon E5/7 v3)
CPU Access Times
Logical 0
Logical 1
Core – 3.33GHz
1 CPU cycle = 0.3ns
L1 cache – 4 CPU clocks (1ns)
L2 cache 12 CPU cycles (4ns?)
L1 I
L1 D
L2 Unified
L3 Slice
DRAM
L3 cache 29+ cycles
Local node memory
28 cycles + 49 ns (open page)
28 cycles + 56 ns (random page)
Remote node (1-hop) memory
28 + 100ns
2-hop 150-300ns+?
Latency Orders of Magnitude
Core
Core – 3.33GHz
1 CPU cycle = 0.3ns
L1
Cache
L1 cache – 4 CPU clocks (1ns)
L1
Cache
L2 cache 12 CPU cycles (4ns?)
L3 cache 29+ cycles
LLC
Local node memory
28 cycles + 49 ns (open page)
28 cycles + 56 ns (random page)
Remote node (1-hop) memory
28 + 100ns
GFX
MC
DMI
x4
x4
x4
x4
PCH
2-hop 150-300ns+?
Westmere-EX 8-Socket System
IOH 0
QPI QPI
QPI QPI
QPI QPI
C4
C3
C2
C5
C6
C7
C4
C3
C2
C5
C6
C7
C1
C0
C8
C9
C1
C0
C8
C9
MC
MC
MC
MC
LLC
LLC
Large server systems are very
complicated
QPI QPI
QPI QPI
QPI QPI
QPI QPI
SMB
C4
C3
C2
C5
C6
C7
C4
C3
C2
C5
C6
C7
SMB
LLC
C1
C0
C8
C9
MC
MC
SMB
QPI QPI
QPI QPI
QPI QPI
SMB
C5
C6
C7
C4
C3
C2
C5
C6
C7
SMB
MC
QPI QPI
C4
C3
C2
LLC
QPI
LLC
C1
C0
C8
C9
MC
MC
MC
MC
QPI QPI
C4
C3
C2
C5
C6
C7
C1
C0
C8
C9
C1
C0
C8
C9
MC
MC
MC
MC
LLC
LLC
QPI
IOH 3
ESI
QPI QPI
C5
C6
C7
PCI-E x4
PCI-E x8
PCI-E x8
PCI-E x8
PCI-E x8
QPI QPI
C4
C3
C2
QPI
QPI
IOH 2
QPI QPI
SMB
C8
C9
SMB
C1
C0
This applies to the OS, SQL Server
and the application
SMB
MC
QPI
LLC
QPI
C8
C9
QPI
C1
C0
Software developed without
consideration for system
architecture will likely have
severe problems
IOH 1
QPI
QPI
QPI QPI
PCH
Storage 2001 versus 2012/13
QPI
192 GB
QPI
MCH
HDD
HDD
HDD
10GbE
RAID
RAID
RAID
RAID
SSD
SSD
SSD
SSD
HDD
HDD
HDD
HDD
HDD
HDD
2001
100 x 10K HDD
125 IOPS each = 12.5K IOPS
IO Bandwidth limited: 1.3GB/s
(1/3 memory bandwidth)
PCIe x8
HDD
HDD
PCIe x8
HDD
PCIe x8
PCI
RAID
PCIe x8
PCI
RAID
PCIe x8
PCI
RAID
PCIe x4
PCI
RAID
IB
2013
64 SSDs, >10K+ IOPS each,
1M IOPS total possible
10-20GB/s+ IO Bandwidth easy
6.4GB/s on each PCIe G3 x8
SAN vendors –
questionable BW
http://www.qdpma.com/Storage/Storage2013.html
http://www.qdpma.com/ppt/Storage_2013.pptx
SAN
Node 1
Node 2
768 GB
768 GB
x8
8 Gb FC
SP A
SP B
24 GB
SP B
24 GB
24 GB
24 GB
x4 SAS
x4 SAS 2GB/s
Auto-tier pools
10K 7.2K
x8
Switch
Switch
SP A
Hot Spares
SSD
SSD
Switch
0.8 GB/s
SSD
x8
x8
SSD
8 Gb FC
or
10Gb FCOE
Switch
1024 GB
x8
x8
SSD
x8
PCIe
HBA
Node 2
1024 GB
x8
PCIe
HBA
Node 1
2GB/s
Data 1
Data 2
Data 3
Data 4
Data 5
Data 6
Data 7
Data 8
Data 9
Data 10
Data 11
Data 12
Data 13
Data 14
Data 15
Data 16
SSD 1
SSD 2
SSD 3
SSD 4
Log 1
Log 2
Log 3
Log 4
http://sqlblog.com/blogs/joe_chang/archive/2013/05/10/enterprise-storage-systems-emc-vmax.aspx
http://sqlblog.com/blogs/joe_chang/archive/2013/02/25/emc-vnx2-and-vnx-future.aspx
Performance Past, Present, Future
• When will servers be so powerful that …
– Been saying this for a long time
• Today – 10 to 100X overkill
– 32-cores in 2012, 60-cores in 2014
– Enough memory that IO is only sporadic
– Unlimited IOPS with SSD
• What can go wrong?
Today’s topic
SQL Performance
SQL
Tables
natural keys
Indexes
Query
Optimizer
DOP Memory
Parallel plans
API Server Cursors:
open, prepare,
execute, close?
SET NO COUNT
Information
messages
Execution
Plan
Storage
Engine
Hardware
Statistics
& Compile
parameters
Compile
Row estimate
propagation
errors
Recompile
temp table /
table variable
Index & Stats
Maintenance
Tables and SQL combined
implement business logic
Natural keys with unique
indexes, not SQL
Index and Statistics
maintenance policy
1 Logic may need more than
one execution plan?
Compile cost versus
execution cost?
Plan cache bloat?
The Execution Plan links all the elements of performance
Index tuning alone has limited value
Over indexing can cause problems as well
Factors to Consider
SQL
Tables
Indexes
Statistics
Query
Optimizer
Storage
Engine
Hardware
Compile
Parameters
DOP
memory
Special Topics
• Data type mismatch
• Multiple Optional Search Arguments (SARG)
– Function on SARG
•
•
•
•
•
Parameter Sniffing versus Variables
Statistics related (big topic)
OR, AND/OR combinations IN/NOT IN, EXISTS
Complex Query with sub-expressions
Parallel Execution
Not in order of priority
http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx
1a. Data type mismatch
DECLARE @name nvarchar(25) = N'Customer#000002760'
SELECT * FROM CUSTOMER WHERE C_NAME = @name
Table column is varchar
Parameter/variable is nvarchar
SELECT * FROM CUSTOMER WHERE C_NAME = CONVERT(varchar, @name)
.NET auto-parameter discovery?
Unable to use index seek
1b. Type Mismatch – Row Estimate
SELECT * FROM CUSTOMER WHERE C_NAME LIKE 'Customer#00000276%'
SELECT * FROM CUSTOMER WHERE C_NAME LIKE N’Customer#00000276%'
Row estimate
error could have
severe
consequences in a
complex query
SELECT TOP + Row Estimate Error
SELECT TOP 1000 [Document].[ArtifactID]
FROM [Document] (NOLOCK)
WHERE [Document].[AccessControlListID_D] IN (1,1000064,1000269)
AND EXISTS (
SELECT [DocumentBatch].[BatchArtifactID]
FROM [DocumentBatch] (NOLOCK)
INNER JOIN [Batch] (NOLOCK)
ON [Batch].ArtifactID = [DocumentBatch].[BatchArtifactID]
WHERE
[DocumentBatch].[DocumentArtifactID] = [Document].[ArtifactID]
AND [Batch].[Name] LIKE N'%Value%'
)
ORDER BY [Document].[ArtifactID]
Data type mismatch – results in estimate rows high
Top clause – easy to find first 1000 rows
In fact, there are few rows that match SARG
Wrong plan for evaluating large number of rows
http://www.qdpma.com/CBO/Relativity.html
MULTIPLE OPTIONAL SARG
2. Multiple Optional SARG
DECLARE @Orderkey int, @Partkey int = 1
SELECT * FROM LINEITEM
WHERE (@Orderkey IS NULL OR L_ORDERKEY = @Orderkey)
AND (@Partkey IS NULL OR L_PARTKEY = @Partkey)
AND (@Partkey IS NOT NULL OR @Orderkey IS NOT NULL)
IF block
DECLARE @Orderkey int, @Partkey int = 1
These are actually the
stored procedure
parameters
IF (@Orderkey IS NOT NULL)
SELECT * FROM LINEITEM
WHERE (L_ORDERKEY = @Orderkey)
AND (@Partkey IS NULL OR L_PARTKEY = @Partkey)
ELSE IF (@Partkey IS NOT NULL)
SELECT * FROM LINEITEM
WHERE (L_PARTKEY = @Partkey)
Need to consider impact of Parameter Sniffing,
Consider the OPTIMIZER FOR hint
Dynamically Built Parameterized SQL
DECLARE @Orderkey int, @Partkey int = 1
, @SQL nvarchar(500), @Param nvarchar(100)
SELECT @SQL =
N‘/* Comment */
SELECT * FROM LINEITEM WHERE 1=1‘
, @Param = [email protected] int, @Partkey int'
IF (@Orderkey IS NOT NULL)
SELECT @SQL = @SQL + N' AND L_ORDERKEY = @Orderkey'
IF (@Partkey IS NOT NULL)
SELECT @SQL = @SQL + N' AND L_PARTKEY = @Partkey'
PRINT @SQL
exec sp_executesql @SQL, @Param, @Orderkey, @Partkey
IF block is easier for few options
Dynamically built parameterized SQL better for many options
Consider /*comment*/ to help identify source of SQL
2b. Function on column SARG
SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM
WHERE YEAR(L_SHIPDATE) = 1995 AND MONTH(L_SHIPDATE) = 1
SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM
WHERE L_SHIPDATE BETWEEN '1995-01-01' AND '1995-01-31'
DECLARE @Startdate date, @Days int = 1
SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM
WHERE L_SHIPDATE BETWEEN @Startdate AND DATEADD(dd,1,@Startdate)
Estimated versus Actual Plan - rows
Estimated Plan – 1 row???
Actual Plan – actual rows 77,356
3 Parameter Sniffing
-- first call, procedure compiles with these parameters
exec p_Report @startdate = '2011-01-01', @enddate = '2011-12-31'
-- subsequent calls, procedure executes with original plan
exec p_Report @startdate = '2012-01-01', @enddate = '2012-01-07'
Assuming date data type
Need different execution plans for narrow and wide range
Options:
1) OPTIMIZE FOR – one plan for all ranges
2) WITH RECOMPILE – compile on each execute
3) main procedure calls 1 of 2 identical sub-procedures
One sub-procedure is only called for narrow range
Other called for wide range
Skewed data distributions also important
Example: Large & small customers
STATISTICS
4 Statistics
• Auto-recompute points
• Sampling strategy
– How much to sample - theory?
– Random pages versus random rows
– Histogram Equal and Range Rows
– Out of bounds, value does not exist
– etc.
Statistics Used by the Query Optimizer in SQL Server 2008
Eric N. Hanson and Yavor Angelov, Contributor: Lubor Kollar
Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator
Joseph Sack
http://msdn.microsoft.com/en-us/library/dd535534.aspx
Statistics Structure
• Stored (mostly) in binary field
Scalar values
Density Vector – limit 30, half in NC, half Cluster key
Histogram
Up to 200 steps
Consider not blindly using IDENTITY on critical tables
Example: Large customers get low ID values
Small customers get high ID values
http://sqlblog.com/blogs/joe_chang/archive/2012/05/05/decoding-stats-stream.aspx
Statistics Auto/Re-Compute
• Automatically generated on query compile
• Recompute at 6 rows, 500, every 20%?
Has this changed? 2008 R2
Trace 2371 – lower threshold auto recomputed for large tables
http://support.microsoft.com/kb/2754171
Statistics Sampling
• Sampling theory
– True random sample
– Sample error - square root
N
• Relative error 1/ N
• SQL Server sampling
– Random pages
• But always first and last page???
– All rows in selected pages
Row Estimate Problems (at source)
• Skewed data distribution
• Out of bounds
• Value does not exist
Row estimate errors at source – is classified under statistics topic
Loop Join - Table Scan on Inner Source
Estimated out from first 2 tabes (at right) is
zero or 1 rows. Most efficient join to third
table (without index on join column) is a loop
join with scan. If row count is 2 or more, then
a fullscan is performed for each row from
outer source
Default statistics rules may lead to serious ETL issues
Consider custom strategy
Compile Parameter Not Exists
Main procedure has cursor around view_Servers
First server in view_Servers is ’CAESIUM’
Cursor executes sub-procedure for each Server
sql:
SELECT MAX(ID) FROM TReplWS
WHERE Hostname = @ServerName
But CAESIUM does not exist in TReplWS!
Good and Bad Plan?
SqlPlan Compile Parameters
SqlPlan Compile Parameters
<?xml version="1.0" encoding="utf-8"?>
<ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.1" Build="10.50.2500.0">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementText="@ServerName varchar(50) SELECT @maxid = ISNULL(MAX(id),0)
FROM TReplWS WHERE Hostname = @ServerName"
StatementId="1" StatementCompId="43" StatementType="SELECT" StatementSubTreeCost="0.0032843" StatementEstRows="1"
StatementOptmLevel="FULL" QueryHash="0x671D2B3E17E538F1" QueryPlanHash="0xEB64FB22C47E1CF2"
StatementOptmEarlyAbortReason="GoodEnoughPlanFound">
<StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="false" CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true"
ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false" />
<QueryPlan CachedPlanSize="16" CompileTime="1" CompileCPU="1" CompileMemory="168">
<RelOp NodeId="0" PhysicalOp="Compute Scalar" LogicalOp="Compute Scalar"
EstimateRows="1" EstimateIO="0" EstimateCPU="1e-007“
AvgRowSize="15" EstimatedTotalSubtreeCost="0.0032843" Parallel="0" EstimateRebinds="0" EstimateRewinds="0">
</RelOp>
<ParameterList>
<ColumnReference Column="@ServerName" ParameterCompiledValue="'CAESIUM'" />
</ParameterList>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>
Compile parameter values at bottom of sqlplan file
AND – OR, IN / NOT IN, EXISTS /
NOT EXISTS COMBINATIONS
5a Single Table OR
-- Single table
SELECT * FROM LINEITEM
WHERE L_ORDERKEY = 1
OR L_PARTKEY = 184826
5a Join 2 Tables, OR in SARG
-- subsequent calls, procedure executes with original plan
SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY
FROM LINEITEM
INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY
WHERE L_PARTKEY = 184826 OR O_CUSTKEY = 137099
5a UNION (ALL) instead of OR
SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY
FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = 184826
UNION (ALL)
SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY
FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY
WHERE O_CUSTKEY = 137099 -- AND (L_PARTKEY <> 184826 OR L_PARTKEY IS NULL) --
Caution: select list should
have keys to ensure
correct rows
UNION removes duplicates
(with Sort operation)
UNION ALL does not
-- Hugo Kornelis trick --
5b AND/OR Combinations
• Hash Join is good method to process many rows
– Requirement is equality join condition
– SELECT xx FROM A WHERE col1 IN (expr1) AND col2 NOT IN (expr2)
SELECT xx FROM A WHERE (expr1) AND (expr2 OR expr3)
• AND/OR, IN NOT IN, EXISTS NOT EXISTS
combinations
– Query optimizer may not be to determine that equality
join condition exists
– Execution plan will use loop join,
– and attempt to force hash join will be rejected
• Re-write using UNION in place of OR
• And LEFT JOIN in place of NOT IN
More on AND/OR combinations: http://www.qdpma.com/CBO/Relativity3.html
COMPLEX QUERIES
Complex Queries
• High Compile effort
– Many joins, Many indexes
– Estimated plan cost correlation
• Row estimation errors after multiple
operations
Row estimate errors at source – is classified under statistics topic
Complex Query with Sub-expression
• Query complexity – really high compile cost
• Repeating sub-expressions (including CTE)
– Must be evaluated multiple times
• Main Problem - Row estimate error propagation
• Solution/Strategy – Get a good execution plan
– Temp table when estimate is high, actual is low.
When Estimate is low, and actual rows is high, need to balance temp
table insert overhead versus plan benefit. Would a join hint work?
More on AND/OR combinations: http://www.qdpma.com/CBO/Relativity4.html
http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx
More Plan Details
Query with joining 6 tables
Each table has too many indexes
Row estimate is high – plan cost is high
Query optimizer tries really really hard to find
better plan
Actual rows is moderate, any plan works
Temp Table and Table Variable
• Forget what other people have said
– Most is [email protected]
• Temp Tables – subject to statistics auto/re-compile
• Table variable – no statistics, assumes 1 row
• Question: In each specific case: does the statistics
and recompile help or not?
– Yes: temp table
– No: table variable
Is this still true?
Row Estimate Error after Join
IO – synchronous
when estimate rows
is < 25, asynchronous
when > 25
Row Estimate 2
Parallelism
• Designed for 1998 era
– Cost Threshold for Parallelism: default 5
– Max Degree of Parallelism – instance level
– OPTION (MAXDOP n) – query level
• Today – complex system – 32 cores
– Plan cost 5 query might run in 10ms?
– Some queries at DOP 4 Really need to rethink
parallelism / NUMA strategies
– Others at DOP 16?
More on Parallelism:
http://www.qdpma.com/CBO/ParallelismComments.html
http://www.qdpma.com/CBO/ParallelismOnset.html
Tables with computed columns
may inhibit parallelism?
Number of concurrently running queries x DOP less
than number of logical/physical processors?
Parallel Execution – or not?
Tables with computed columns
using UDF prevent parallelism
Full-Text Search
Loop Join with FT as inner
Source Full Text search
Potentially executed
many times
varchar(max) stored in lob pages
• Disk IO to lob pages is synchronous?
– Must access row to get 16 byte link?
– Feature request: index pointer to lob
SQL PASS 2013
Understanding Data Files at the Byte Level
Mark Rasmussen
legacy
• API Server Cursors / Cursor Stored Procedures
– sp_prepare / sp_prepexec, sp_execute,
sp_unprepare
– sp_cursoropen, sp_cursorfetch, sp_cursorclose
– sp_cursorprepare / sp_cursorprepexec,
sp_cursorexecute, sp_cursorunprepare
• Guess which is not called?
– Symptom: sp_reset_connection
http://technet.microsoft.com/en-us/library/ms187088(v=sql.105).aspx
http://technet.microsoft.com/en-us/library/ms187801(v=sql.120).aspx
API Server Cursors
Cursor Stored Procedures
Summary
• Hardware today is really powerful
– Storage may not be – SAN vendor disconnect
• Standard performance practice
– Top resource consumers, index usage
• But also Look for serious blunders
http://www.qdpma.com/CBO/SQLServerCostBasedOptimizer.html
http://www.qdpma.com/CBO/Relativity.html
http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx
Kevin Boles – Common TSQL Mistakes
Thank you to our sponsors
Special Topics
• Data type mismatch
• Multiple Optional Search Arguments (SARG)
– Function on SARG
•
•
•
•
•
Parameter Sniffing versus Variables
Statistics related (big topic)
AND/OR
Complex Query
Parallel Execution
SQL Server Edition Strategies
• Enterprise Edition – per core licensing costs
– Old system strategy
• 4 (or 2)-socket server, top processor, max memory
– Today: How many cores are necessary
• 2 socket system, max memory (16GB DIMMs)
• Is standard edition adequate
– Low cost, but many important features disabled
• BI edition – 16 cores
– Limited to 64GB for SQL Server process
New Features in SQL Server
• 2005
– Index included columns
– Filtered index
– CLR
• 2008
– Partitioning
– Compression
• 2012
– Column store (non-clustered)
• 2014
– Column store clustered
– Hekaton
General Performance
GENERAL PERFORMANCE
SQL Performance General
• Client-side architecture
– Connection pooling
– stored procedures versus SQL, parameterized
• Database Architecture
– Cluster key, primary key, natural keys, foreign keys
• SQL –
• Indexing
• Indexes & Statistics Maintenance
Client-side Architecture
• Connection pooling:
– Connection.Open, Execute, Connection.Close
– Sp_reset_connection
• Stored procedures – parameterized SQL
– Stored procedure name is short
– Parameterized SQL may not be
• Larger than 1 Ethernet packet? 2?, 8?
Database Architecture
•
•
•
•
Normalization
Cluster key
Primary Key & other unique / natural keys
Foreign keys
Principles
Data
Testing
Server
Storage
Network
CPU & Memory 2001 versus 2014x
C5
C6
C7
C8
C9
PCI-E
14
13
12
11
10
MI
C5
C6
C7
C8
C9
PCI-E
14
13
12
11
10
MI
QPI
QPI
PCI-E
PCI-E
14
13
12
11
10
MI
QPI
QPI
C4
C3
C2 LLC
C1
C0
MI
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
DMI
x4
x4
x4
x4
PCH
PCI-E
GFX
MC
PCI-E
Each core today is more
than 10x over Pentium III
(700MHz?)
PCI-E
DMI 2
Xeon MP 2002-4
PCI-E
QPI
C4
C3
C2 LLC
C1
C0
MI
PCI-E
2001 – 4 sockets, 4 cores
Pentium III Xeon, 900MHz
4-8GB memory?
PCI-E
C5
C6
C7
C8
C9
QPI
PCI-E
QPI
C4
C3
C2 LLC
C1
C0
MI
PCI-E
14
13
12
11
10
MI
PCI-E
C5
C6
C7
C8
C9
PCI-E
MCH
QPI
C4
C3
C2 LLC
C1
C0
MI
PCI-E
FSB
PCI-E
P
PCI-E
P
PCI-E
P
DMI 2
P
L2
Xeon E7 v2 (Ivy Bridge, 3 QPI)
4 x 15 = 60 cores
3TB (96 x 32GB) 24 DIMMs per socket
40 PCI-E gen3 lanes + x4 g2 / socket
Mem___2013 __ 2014
16GB __ $191 __ $180
32GB __ $794 __ $650
Work in progress
MI
MI
LLC
MI
QPI
C4
C3
C2 LLC
C1
C0
MI
C5
C6
C7
C8
C9
PCI-E
E
D
C
B
C
MI
PCI-E
C4
C5
C6
C7
C4
C5
C6
C7
C5
C6
C7
C8
C9
PCI-E
PCI-E
QPI
C4
C3
C2 LLC
C1
C0
MI
PCI-E
MI
LLC
QPI
C3
C2
C1
C0
PCI-E
C3
C2
C1
C0
PCI-E
PCI-E
QPI
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
DMI 2
QPI
PCI-E
C4
C5
C3
C6
LLC
C2
C7
C1
C8
MI
MI
PCI-E
14
13
12
11
10
MI

similar documents