### 05. quicksort

```Algorithm Design and Analysis (ADA)
242-535, Semester 1 2014-2015
5. Quicksort
• Objective
o describe the quicksort algorithm, it's partition
function, and analyse its running time under
different data conditions
1
Overview
1. Quicksort
2. Partitioning Function
3. Analysis of Quicksort
4. Quicksort in Practice
2
1. Quicksort
• Proposed by Tony Hoare in 1962.
• Voted one of top 10 algorithms of 20th century in
science and engineering
o http://www.siam.org/pdf/news/637.pdf
• A divide-and-conquer algorithm.
• Sorts “in place” -- rearranges elements using only
the array, as in insertion sort, but unlike merge sort
which uses extra storage.
• Very practical (after some code tuning).
3
Divide and conquer
Quicksort an n-element array:
1. Divide: Partition the array into two subarrays
around a pivot x such that elements in lower
subarray ≤ x ≤ elements in upper subarray.
2. Conquer: Recursively sort the two subarrays.
3. Combine: Nothing to do.
Key: implementing a linear-time partitioning function
4
Pseudocode
quicksort(int[] A, int left, int right)
if (left < right) // If the array has 2 or more items
pivot = partition(A, left, right)
// recursively sort elements smaller than the pivot
quicksort(A, left, pivot-1)
// recursively sort elements bigger than the pivot
quicksort(A, pivot+1, right)
5
Quicksort
Diagram
pivot
6
Fine Tuning the Code
• quicksort will stop when the subarray is 0 or 1
element big.
• When the subarray gets to a small size, switch over
to dedicated sorting code rather than relying on
recursion.
• quicksort is tail-recursive, a recursive behaviour
which can be optimized.
7
Tail-Call Optimization
• Tail-call optimization avoids allocating a new stack
frame for a called function.
o It isn't necesary because the calling function only returns
the value that it gets from the called function.
• The most common use of this technique is for
optimizing tail-recursion
o the recursive function can be rewritten to use a constant
amount of stack space (instead of linear)
8
Tail-Call Graphically
• Before applying tail-call optimization:
• After applying it:
9
Pseudocode
• Before:
int foo(int n) {
if (n == 0)
return A();
else {
int x = B(n);
return foo(x);
}
}
• After:
int foo(int n) {
if (n == 0)
return A();
else {
int x = B(n);
goto start of foo() code
with x as argument value
}
}
2. Partitioning Function
PARTITION(A, p, q)
x ← A[p]
// pivot = A[p]
i←p
// index
// A[p . . q]
Running time
= O(n) for n
elements.
for j ← p + 1 to q
if A[ j] ≤ x then
i←i+1
// move the i boundary
exchange A[i] ↔ A[ j] // switch big and small
exchange A[p] ↔ A[i]
return i
// return index of pivot
11
Example of partitioning
scan right until find something
less than the pivot
12
Example of partitioning
13
Example of partitioning
14
Example of partitioning
swap 10 and 5
15
Example of partitioning
resume scan right until find
something less than the pivot
16
Example of partitioning
17
Example of partitioning
18
Example of partitioning
swap 13 and 3
19
Example of partitioning
swap 10 and 2
20
Example of partitioning
21
Example of partitioning
j runs to the end
22
Example of partitioning
swap pivot and 2
so in the middle
23
3. Analysis of Quicksort
• The analysis is quite tricky.
• Assume all the input elements are distinct
o no duplicate values makes this code faster!
o there are better partitioning algorithms when duplicate
input elements exist (e.g. Hoare's original code)
• Let T(n) = worst-case running time on an array of n
elements.
24
3.1. Worst-case of quicksort
• QUICKSORT runs very slowly when its input array is
already sorted (or is reverse sorted).
o almost sorted data is quite common in the real-world
• This is caused by the partition using the min (or max)
element which means that one side of the partition
will have has no elements. Therefore:
T(n) = T(0) +T(n-1) + Θ(n)
= Θ(1) +T(n-1) + Θ(n)
= T(n-1) + Θ(n)
= Θ(n2) (arithmetic series)
no elements
n-1 elements
25
Worst-case recursion tree
T(n) = T(0) +T(n-1) + cn
26
Worst-case recursion tree
T(n) = T(0) +T(n-1) + cn
T(n)
27
Worst-case recursion tree
T(n) = T(0) +T(n-1) + cn
cn
T(0)
T(n-1)
28
Worst-case recursion tree
T(n) = T(0) +T(n-1) + cn
cn
T(0)
c(n-1)
T(0)
T(n-2)
29
Worst-case recursion tree
T(n) = T(0) +T(n-1) + cn
cn
T(0)
c(n-1)
T(0)
T(n-2)
T(0)
Θ(1)
30
Worst-case recursion tree
T(n) = T(0) +T(n-1) + cn
31
Quicksort isn't Quick?
• In the worst case, quicksort isn't any quicker than
insertion sort.
• So why bother with quicksort?
• It's average case running time is very good, as we'll
see.
32
3.2. Best-case Analysis
If we’re lucky, PARTITION splits the
Case 2 of the
array evenly:
Master Method
T(n) = 2T(n/2) + Θ(n)
= Θ(n log n)
(same as merge sort)
33
3.3. Almost Best-case
What if the split is always 1/10 : 9/10?
T(n) = T(1/10n) + T(9/10n) + Θ(n)
34
Analysis of “almost-best” case
T(n)
35
Analysis of “almost-best” case
cn
T(1/10n)
T(9/10n)
36
Analysis of “almost-best” case
cn
T(1/10n)
T(1/100n ) T(9/100n)
T(9/10n)
T(9/100n) T(81/100n)
37
Analysis of “almost-best” case
38
Analysis of “almost-best” case
short
path
long
path
cn * short path
cn * long path
all leaves
39
Short and Long Path Heights
• Short path node value:
n  (1/10)n  (1/10)2n  ...  1
n(1/10)sp
• 
=1
•  n = 10sp
•  log10n = sp
sp steps
// take logs
• Long path node value:
n  (9/10)n  (9/10)2n  ...  1
n(9/10)lp
• 
=1
•  n = (10/9)lp
• 242-535
 log
n = lp
5. Quicksort
lp steps
// take logs
40
partitions ….
G(n) = 2B(n/2) + Θ(n)
good
B(n) = L(n – 1) + Θ(n)
Solving:
G(n) = 2( G(n/2 – 1) + Θ(n/2) ) + Θ(n)
= 2G(n/2 – 1) + Θ(n)
= Θ(n log n)
Good!
How can we make sure we choose good partitions?
41
Randomized Quicksort
IDEA: Partition around a random element.
• Running time is then independent of the input
order.
input distribution.
• No specific input leads to the worst-case
behavior.
• The worst case is determined only by the output
of a random-number generator.
42
4. Quicksort in Practice
• Quicksort is a great general-purpose sorting
algorithm.
o especially with a randomized pivot
o Quicksort can benefit substantially from code tuning
o Quicksort can be over twice as fast as merge sort
• Quicksort behaves well even with caching
and virtual memory.