Binary Trees

Report
Chapter 6: Binary Trees
Objectives
Looking ahead – in this chapter, we’ll consider
• Trees, Binary Trees, and Binary Search Trees
• Implementing Binary Trees
• Searching a Binary Search Tree
• Tree Traversal
• Insertion
• Deletion
Data Structures and Algorithms in C++, Fourth Edition
2
Objectives (continued)
•
•
•
•
•
•
Balancing a Tree
Self-Adjusting Trees
Heaps
Treaps
k-d Trees
Polish Notation and Expression Trees
Data Structures and Algorithms in C++, Fourth Edition
3
Trees, Binary Trees, and
Binary Search Trees
• While linked lists, stacks, and queues are useful data
structures, they do have limitations
– Linked lists are linear in form and cannot reflect hierarchically
organized data
– Stacks and queues are one-dimensional structures and have limited
expressiveness
• To overcome these limitations, we’ll consider a new data
structure, the tree
• Trees consist of two components, nodes and arcs (or edges)
• Trees are drawn with the root at the top, and “grow” down
– The leaves of the tree (also called terminal nodes) are at the bottom
of the tree
Data Structures and Algorithms in C++, Fourth Edition
4
Trees, Binary Trees, and
Binary Search Trees (continued)
• Trees can be defined recursively as follows:
1.
2.
3.
A tree with no nodes or arcs (an empty structure) is an empty tree
If we have a set t1… tk of disjoint trees, the tree whose root has the
roots of t1… tk as its children is a tree
Only structures generated by rules 1 and 2 are trees
• Every node in the tree must be accessible from the root
through a unique sequence of arcs, called a path
• The number of arcs in the path is the path’s length
• A node’s level is the length of the path to that node, plus 1
Data Structures and Algorithms in C++, Fourth Edition
5
Trees, Binary Trees, and
Binary Search Trees (continued)
• The maximum level of a node in a tree is the tree’s height
• An empty tree has height 0, and a tree of height 1 consists of
a single node which is both the tree’s root and leaf
• The level of a node must be between 1 and the tree’s height
• Some examples of trees are shown in Figure 6.1
Fig. 6.1 Some examples of trees
Data Structures and Algorithms in C++, Fourth Edition
6
Trees, Binary Trees, and
Binary Search Trees (continued)
• The number of children of a given node can be arbitrary
• Using trees may also to improve the process of searching for
elements
• In order to find a particular element in a list of n elements, we
have to examine all those before that element in the list
– This holds even if the list is ordered
• On the other hand, if the elements of a list are stored in a tree
that is organized in a predetermined fashion, the number of
elements that must be looked at can be substantially reduced
Data Structures and Algorithms in C++, Fourth Edition
7
Trees, Binary Trees, and
Binary Search Trees (continued)
• The order of nodes in the figure below doesn’t achieve
anything, because there is no consideration of searching
incorporated into its design
• However, by applying a consistent ordering to the nodes,
considerable savings in searching can be achieved
Fig. 6.3 Transforming (a) a linked list into (b) a tree
Data Structures and Algorithms in C++, Fourth Edition
8
Trees, Binary Trees, and
Binary Search Trees (continued)
• A binary tree is a tree where each node has only two children,
designated the left child and the right child
• These children can be empty; Figure 6.4 shows examples of
binary trees
Fig. 6.4 Examples of binary trees
• An important attribute of binary trees is the number of leaves
• This is useful in assessing efficiency of algorithms
Data Structures and Algorithms in C++, Fourth Edition
9
Trees, Binary Trees, and
Binary Search Trees (continued)
• As specified earlier, the level of a node is the number of arcs
between it and the root, plus 1
• The root is at level 1, its children at level 2, etc.
• So if each node at any given level (except the last) had two
children, there would be 20 nodes at level 1, 21 nodes at level
2, etc.
• In general, there would be 2i nodes at level i + 1
• A tree that exhibits this is called a complete binary tree
• In such a tree, all nonterminal nodes have both children, and
all leaves are on the same level
Data Structures and Algorithms in C++, Fourth Edition
10
Trees, Binary Trees, and
Binary Search Trees (continued)
• Because leaves can occur throughout this tree (except at level
1), there is no general formula to calculate the number of
nodes
• However, it can be approximated:
– For all the nonempty binary trees whose nonterminal nodes have
exactly two nonempty children, the number of leaves m is greater
than the number of nonterminal nodes k and m = k + 1
– This holds trivially for the tree consisting of only the root
Data Structures and Algorithms in C++, Fourth Edition
11
Trees, Binary Trees, and
Binary Search Trees (continued)
• For any given tree for which the condition holds, attaching
two leaves to an existing leaf will make it nonterminal
• This decreases the leaf nodes by 1 and increases the number
of nonterminals by 1
• However, the two new leaves increase the number of leaves
by 2, so the relation becomes (m – 1) + 2 = (k + 1) + 1
• This simplifies to m = k + 1, which is the desired result
• This means that an i + 1 level complete decision tree has 2i
leaves and 2i – 1 nonterminal nodes, totaling 2i+1 – 1 nodes
Data Structures and Algorithms in C++, Fourth Edition
12
Trees, Binary Trees, and
Binary Search Trees (continued)
Fig. 6.5 (a) Adding a leaf to tree, (b) preserving the relation of
the number of leaves to the number of nonterminal nodes
Data Structures and Algorithms in C++, Fourth Edition
13
Trees, Binary Trees, and
Binary Search Trees (continued)
• In a binary search tree (or ordered binary tree), values stored
in the left subtree of a given node are less than the value
stored in that node, and values stored in the right subtree of a
given node are greater than the value stored in that node
– The values stored are considered unique; attempts to store duplicate values
can be treated as an error
– The meanings of the expressions “less than” and “greater than” will depend
on the types of values stored
Fig. 6.6 Examples of binary search trees
Data Structures and Algorithms in C++, Fourth Edition
14
Implementing Binary Trees
• We can use arrays or linked structures to implement binary trees
• If using an array, each element stores a structure that has an
information field and two “pointer” fields containing the indexes of
the array locations of the left and right children
• The root of the tree is always in the first cell of the array, and a
value of -1 indicates an empty child
Fig. 6.7 Array representation of the tree in Figure 6.6c
Data Structures and Algorithms in C++, Fourth Edition
15
Implementing Binary Trees (continued)
• Implementing binary tree arrays does have drawbacks
– We need to keep track of the locations of each node, and these have to be
located sequentially
– Deletions are also awkward, requiring tags to mark empty cells, or moving
elements around, requiring updating values
• Consequently, while arrays are convenient, we’ll usually use a linked
implementation
• In a linked implementation, the node is defined by a class, and
consists of an information data member and two pointer data
members
• The node is manipulated by methods defined in another class that
represents the tree
• The code for this is shown in Figure 6.8 on pages 220-222
Data Structures and Algorithms in C++, Fourth Edition
16
Searching a Binary Search Tree
• Locating a specific value in a binary tree is easy:
Fig. 6.9 A function for searching a binary search tree
• For each node, compare the value to the target value; if they
match, the search is done
• If the target is smaller, we branch to the left subtree; if larger,
we branch to the right
• If at any point we cannot proceed further, then the search has
failed and the target isn’t in the tree
Data Structures and Algorithms in C++, Fourth Edition
17
Searching a Binary Search Tree
(continued)
• Using this approach and referring to Figure 6.6c, we can find
the value 31 in only three comparisons
• Finding (or not finding) the values 26 – 30 requires the
maximum of four comparisons; all other values require less
than four
• This also demonstrates why a value should occur only once in
a tree; allowing duplicates requires additional searches:
– If there is a duplicate, we must either locate the first occurrence and
ignore the others, or
– We must locate each duplicate, which involves searching until we can
guarantee that no path contains another instance of the value
• This search will always terminate at a leaf node
Data Structures and Algorithms in C++, Fourth Edition
18
Searching a Binary Search Tree
(continued)
• The number of comparisons performed during the search
determines the complexity of the search
• This in turn depends on the number of nodes encountered on
the path from the root to the target node
• So the complexity is the length of the path plus 1, and is
influenced by the shape of the tree and location of the target
• Searching in a binary tree is quite efficient, even if it isn’t
balanced
• However, this only holds for randomly created trees, as those
that are highly unbalanced or elongated and resemble linear
linked lists approach sequential search times
Data Structures and Algorithms in C++, Fourth Edition
19
Tree Traversal
• Tree traversal is the process of visiting each node in a tree
data structure exactly one time
• This definition only specifies that each node is visited, but
does not indicate the order of the process
• Hence, there are numerous possible traversals; in a tree of n
nodes there are n! traversals
• Two especially useful traversals are depth-first traversals and
breadth-first traversals
Data Structures and Algorithms in C++, Fourth Edition
20
Tree Traversal (continued)
• Breadth-First Traversal
– Breadth-first traversal proceeds level-by-level from top-down or
bottom-up visiting each level’s nodes left-to-right or right-to-left
– This gives us four possibilities; a top-down, left-to-right breadth-first
traversal of Figure 6.6c yields 13, 10, 25, 2, 12, 20, 31, 29
– This can be easily implemented using a queue
– If we consider a top-down, left-to-right breadth-first traversal, we start
by placing the root node in the queue
– We then remove the node at the front of the queue, and after visiting
it, we place its children (if any) in the queue
– This is repeated until the queue is empty
Data Structures and Algorithms in C++, Fourth Edition
21
Tree Traversal (continued)
• Breadth-First Traversal (continued)
– An implementation of this is shown in Figure 6.10
Fig. 6.10 Top-down, left-to-right, breadth-first traversal implementation
Data Structures and Algorithms in C++, Fourth Edition
22
Tree Traversal (continued)
• Breadth-First Traversal (continued)
– The following diagram shows a traversal of the tree from Figure 6.6c,
using the queue-based breadth-first traversal
Tree
Queue
Output
13
10
25
13
25
2
12
2
12
20
12
20
31
20
31
13, 10
31
13, 10, 25
13, 10, 25, 2
13, 10, 25, 2, 12
31
13, 10, 25, 2, 12, 20
29
13, 10, 25, 2, 12, 20, 31
13, 10, 25, 2, 12, 20, 31, 29
The
figure 6.6c
The queue
queue (middle)
(middle) and
and output
output (right)
(right) from
from aa breadth-first
breadth-first traversal of the tree from Figure
6.6c (left).
(left)
Data Structures and Algorithms in C++, Fourth Edition
23
Tree Traversal (continued)
• Depth-First Traversal
– Depth-first traversal proceeds by following left- (or right-) hand
branches as far as possible
– The algorithm then backtracks to the most recent fork and takes the
right- (or left-) hand branch to the next node
– It then follows branches to the left (or right) again as far as possible
– This process continues until all nodes have been visited
– While this process is straightforward, it doesn’t indicate at what point
the nodes are visited; there are variations that can be used
– We are interested in three activities: traversing to the left, traversing
to the right, and visiting a node
• These activities are labeled L, R, and V, for ease of representation
Data Structures and Algorithms in C++, Fourth Edition
24
Tree Traversal (continued)
• Depth-First Traversal (continued)
– Based on earlier discussions, we want to perform the traversal in an
orderly manner, so there are six possible arrangements:
• VLR, VRL, LVR, LRV, RVL, and RLV
– Generally, we follow the convention of traversing from left to right,
which narrows this down to three traversals:
• VLR – known as preorder traversal
• LVR – known as inorder traversal
• LRV – known as postorder traversal
– These can be implemented straightforwardly, as seen in Figure 6.11
Data Structures and Algorithms in C++, Fourth Edition
25
Tree Traversal (continued)
Fig. 6.11 Depth-first traversal implementations
Data Structures and Algorithms in C++, Fourth Edition
26
Tree Traversal (continued)
• Depth-First Traversal (continued)
– While the code is simple, the power lies in the recursion supported by
the run-time stack, which places a heavy burden on the system
– To gain more insight into the behavior of these algorithms, let’s
consider the inorder routine
– In this traversal, if the tree is nonempty, we traverse the left subtree of
the node, then visit the node, then traverse the right subtree
Fig. 6.12 Inorder tree traversal
Data Structures and Algorithms in C++, Fourth Edition
27
Tree Traversal (continued)
• Depth-First Traversal (continued)
– Because of the order of the recursion in the code, the V and R steps
are held pending until the L step completes
– This is the function of the stack, to “remember” the backtrack point,
so that after a left traversal ends, the routine can back up to visit the
branch point node, and then proceed to the right
– This is illustrated in Figure 6.13, where each node is labeled with the
activities “LVR”, and they are scratched out as they are performed for a
given node
– To see how this works, we can observe the operation of the runtime
stack shown in Figure 6.14 on page 230; the numbers in parentheses
refer to return addresses indicated in the code on page 228
Data Structures and Algorithms in C++, Fourth Edition
28
Tree Traversal (continued)
Fig. 6.13 Details of several of the first steps of inorder traversal
Data Structures and Algorithms in C++, Fourth Edition
29
Tree Traversal (continued)
• Depth-First Traversal (continued)
– Now let’s consider nonrecursive implementations of the traversal
algorithms
– As we’ve learned, recursive algorithms tend to be less efficient than
their nonrecursive versions
– So we need to determine if it is useful to pursue nonrecursive versions
of the traversal algorithms
– Let’s first consider a nonrecursive version of the preorder algorithm,
shown in Figure 6.15
– While still readable, it makes extensive use of the stack, and the
number of calls in the processing loop is actually twice the number in
the recursive version of the code, which is hardly an improvement
Data Structures and Algorithms in C++, Fourth Edition
30
Tree Traversal (continued)
Fig. 6.15 A nonrecursive implementation of preorder tree traversal
Data Structures and Algorithms in C++, Fourth Edition
31
Tree Traversal (continued)
• Depth-First Traversal (continued)
– Recursive algorithms can easily be derived from one another by simply
transposing the function calls
– This is not the case with the nonrecursive algorithms, however; the
order of the calls and their interaction with the stack is critical
– So the inorder and postorder nonrecursive algorithms have to be
developed separately
– Fortunately, creating a postorder algorithm can be accomplished easily
by noting that an LRV traversal is simply a reversed VRL traversal
– This is a right-to-left preorder traversal, so we can adapt the preorder
algorithm to create the postorder one
– This will require two stacks to handle the reversal process from
preorder to postorder
Data Structures and Algorithms in C++, Fourth Edition
32
Tree Traversal (continued)
• Depth-First Traversal (continued)
– We can utilize a single stack, however, if we push the node based on
the number of descendants it has
– We can push the node once before traversing its left subtree, and then
again before traversing its right subtree
– An auxiliary pointer is used to keep track of the two cases
– Nodes with one descendant get pushed only once, and leaf nodes are
not put on the stack
– This approach is the basis for the code in Figure 6.16
– Inorder traversal is also complicated; the algorithm in Figure 6.17 is
both hard to follow and hard to understand without documentation
Data Structures and Algorithms in C++, Fourth Edition
33
Tree Traversal (continued)
Fig. 6.16 A nonrecursive implementation of postorder tree traversal
Data Structures and Algorithms in C++, Fourth Edition
34
Tree Traversal (continued)
Fig. 6.17 A nonrecursive implementation of inorder tree traversal
Data Structures and Algorithms in C++, Fourth Edition
35
Tree Traversal (continued)
• Stackless Depth-First Traversal: Threaded Trees
– The previous algorithms were all characterized by the use of a stack,
either implicitly through the system, or explicitly in code
– In both cases, additional processing time is required to handle stack
operations, and memory has to be allocated for the stack
– In extreme cases where the tree is highly skewed, this can be a serious
processing concern
– A more efficient implementation can be achieved if the stack is
incorporated into the design of the tree itself
– This is done by using threads, pointers to the predecessor and
successor of a node based on an inorder traversal
– Trees using threads are called threaded trees
Data Structures and Algorithms in C++, Fourth Edition
36
Tree Traversal (continued)
• Stackless Depth-First Traversal: Threaded Trees (continued)
– To implement threads, four pointers would be needed for each node,
but this can be reduced by overloading the existing pointers
– The left pointer can be used to point to the left child or the
predecessor, and the right pointer can point to the right child or
successor
– This is illustrated in Figure 6.18(a)
– The figure suggests that threads to both predecessors and successors
need to be used, but this is not always true
– Figure 6-18b shows the inorder traversal of a threaded tree, using only
successor threads
Data Structures and Algorithms in C++, Fourth Edition
37
Tree Traversal (continued)
• Stackless Depth-First Traversal: Threaded Trees (continued)
Fig. 6.18 (a) A threaded tree and (b) an inorder traversal’s path
in a threaded tree with right successors only
– The implementation of this is relatively simple; the traversal is
indicated by the dashed lines in Figure 6.18b
– Only a single variable is needed for this; no stack is required
– However, the memory savings will be highly dependent on the
implementation, shown in Figure 6-19 on pages 235 and 236
Data Structures and Algorithms in C++, Fourth Edition
38
Tree Traversal (continued)
• Stackless Depth-First Traversal: Threaded Trees (continued)
– We can also use threads to support preorder and postorder traversal
– In preorder, the existing threads can be used to determine the
appropriate successors
– Postorder requires somewhat more work, but is only slightly more
complicated to accomplish
Data Structures and Algorithms in C++, Fourth Edition
39
Tree Traversal (continued)
• Stackless Depth-First Traversal: Tree Transformation
– The approaches to traversal thus far considered have used stacks to
support the traversal or incorporated the stack into the tree
– Both of these have memory overhead that can impact the efficiency of
the algorithms
– However, it is possible to carry out traversals without using stacks or
threads
– These algorithms rely on making temporary changes in the tree
structure during traversal, and restoring the structure when done
– One elegant algorithm to accomplish this was developed by Joseph M.
Morris in 1979 and is shown here for inorder traversal
Data Structures and Algorithms in C++, Fourth Edition
40
Tree Traversal (continued)
• Stackless Depth-First Traversal: Tree Transformation (cont’d)
– The algorithm is based on the observation that inorder traversal is
very simple for trees that have no left children (see Figure 6.1e)
– Since no left subtree has to be considered, the LVR traversal reduces
to VR
– Morris’s algorithm utilizes this observation by modifying the tree so
that the node being processed has no left child
– This allows the node to be visited and then the right subtree can be
investigated
– Since this changes the tree’s structure, the traversal can only be done
once, and information must be kept to restore the original tree
Data Structures and Algorithms in C++, Fourth Edition
41
Tree Traversal (continued)
• Stackless Depth-First Traversal: Tree Transformation (cont’d)
– The algorithm can be described as follows:
MorrisInorder()
while not finished
if node has no left descendant
visit it;
go to the right;
else make this node the right child of the
rightmost node in its left descendant;
go to this left descendant;
Data Structures and Algorithms in C++, Fourth Edition
42
Tree Traversal (continued)
• Stackless Depth-First Traversal: Tree Transformation (cont’d)
– An implementation of this algorithm is shown in Figure 6.20
Fig. 6.20 Implementation of the Morris algorithm for inorder traversal
Data Structures and Algorithms in C++, Fourth Edition
43
Tree Traversal (continued)
• Stackless Depth-First Traversal: Tree Transformation (cont’d)
– Details of the traversal are shown in Figure 6-21 (page 239); letters for
the subfigures are referred to in the process steps on pages 237 and
238
– Preorder and postorder traversals can be implemented in a similar
fashion
– The preorder traversal requires moving the visit() operation from
the inner else to the inner if
– Postorder requires additional restructuring of the tree, as described on
page 239
Data Structures and Algorithms in C++, Fourth Edition
44
Insertion
• Searching a binary tree does not modify the tree
• Traversals may temporarily modify the tree, but it is usually left in
its original form when the traversal is done
• Operations like insertions, deletions, modifying values, merging
trees, and balancing trees do alter the tree structure
• We’ll look at how insertions are managed in binary search trees first
• In order to insert a new node in a binary tree, we have to be at a
node with a vacant left or right child
• This is performed in the same way as searching:
– Compare the value of the node to be inserted to the current node
– If the value to be inserted is smaller, follow the left subtree; if it is larger,
follow the right subtree
– If the branch we are to follow is empty, we stop the search and insert the
new node as that child
Data Structures and Algorithms in C++, Fourth Edition
45
Insertion (continued)
• This process is shown in Figure 6.22; the code to implement
this algorithm shown in Figure 6.23
Fig. 6.22 Inserting nodes into binary search trees
Data Structures and Algorithms in C++, Fourth Edition
46
Insertion (continued)
Fig. 6.23 Implementation of the insertion algorithm
Data Structures and Algorithms in C++, Fourth Edition
47
Insertion (continued)
• In looking at tree traversal, we considered three approaches:
stack-based, thread-based, and via transformations
• Stack based traversals don’t change the trees; transformations
change the tree but restore it when done
• Threaded approaches, though, do modify the tree by adding
threads to the structure
• While it may be possible to add and remove the threads as
needed, if the tree is processed frequently, we might want to
make the threads a permanent part of the tree
• This requires incorporating threads into the insertion process
Data Structures and Algorithms in C++, Fourth Edition
48
Insertion (continued)
• The algorithm for inserting a node in a threaded tree is a
simple modification of the original function that adjusts the
threads where needed
• The implementation of this algorithm is shown in Figure 6.24
on page 242; the first insertions are shown in Figure 6.25
Fig. 6.25 Inserting nodes into a threaded tree
Data Structures and Algorithms in C++, Fourth Edition
49
Deletion
• Deletion is another operation essential to maintaining a
binary search tree
• This can be a complex operation depending on the placement
of the node to be deleted in the tree
• The more children a node has, the more complex the deletion
process
• This implies three cases of deletion that need to be handled:
– The node is a leaf; this is the easiest case, because all that needs to be
done is to set the parent link to null and delete the node (Figure 6.26)
– The node has one child; also easy, as we set the parent’s pointer to the
node to point to the node’s child (Figure 6.27)
Data Structures and Algorithms in C++, Fourth Edition
50
Deletion (continued)
Fig. 6.26 Deleting a leaf
Fig. 6.27 Deleting a node with one child
– The third case, and most difficult to handle, is when the node has two
children, as there is no one-step process; we’ll consider two options
Data Structures and Algorithms in C++, Fourth Edition
51
Deletion (continued)
• Deletion by Merging
– The first approach, deletion by merging, works by making one tree out
of the node’s two subtrees and attaching it to the node’s parent
– This is accomplished by recalling that the value of every node in the
right subtree is greater than the value of every node in the left subtree
– So the rightmost node of the left subtree will be the largest value in
that subtree, and it will become the parent of the right subtree
– To do this, we start at the root of the left subtree and follow right links
until we encounter a node with an empty right pointer
– This node will then set that pointer to the right subtree, and the
parent of the left subtree is promoted to replace the deleted node
– This entire operation is shown in Figure 6.28; Figure 6.29 (pages 245
and 246) shows the code for the algorithm
Data Structures and Algorithms in C++, Fourth Edition
52
Deletion (continued)
• Deletion by Merging (continued)
Fig. 6.28 Summary of deleting by merging
– The individual steps in the process are shown in Figure 6.30
– The numbers in the figure correspond to the numbers in the
comments of the code in Figure 6.29
Data Structures and Algorithms in C++, Fourth Edition
53
Deletion (continued)
Fig. 6.30 Details of deleting by merging
Data Structures and Algorithms in C++, Fourth Edition
54
Deletion (continued)
• Deletion by Merging (continued)
– The tree that results from merging may have a very different structure
from the original tree
– In some cases it can be taller, even skewed; occasionally it can be
shorter
– This does not mean the algorithm is inefficient; but we do need to try
and find a way to maintain some type of balance in the tree
– Figure 6.31 illustrates these issues; 6.31a shows the type of imbalance
that may occur, and 6.31b shows a “shorter” tree
Data Structures and Algorithms in C++, Fourth Edition
55
Deletion (continued)
Fig. 6.31 The height of a tree can be (a) extended
or (b) reduced after deleting by merging
Data Structures and Algorithms in C++, Fourth Edition
56
Deletion (continued)
• Deletion by Copying
– Another approach to handling deleting is called deletion by copying
and was proposed by Thomas Hibbard and Donald Knuth in the 1960s
– Initially, this works much like the merging process
– We locate the node’s predecessor by searching for the rightmost node
in the left subtree
– The key of this node replaces the key of the node to be deleted
– We then recall the two simple cases of deletion: if the rightmost node
was a leaf, we delete it; if it has one child, we set the parent’s pointer
to the node to point to the node’s child
– This way, we delete a key k1 by overwriting it by a key k2 and then
deleting the node holding k2
Data Structures and Algorithms in C++, Fourth Edition
57
Deletion (continued)
• Deletion by Copying (continued)
– This algorithm is implemented by two functions, the first of which is
shown in Figure 6.32
Fig. 6.32 Implementation of an algorithm for deleting by copying
Data Structures and Algorithms in C++, Fourth Edition
58
Deletion (continued)
• Deletion by Copying (continued)
– The second function works like the merging function of Figure 6.29,
except it calls the deleteByCopying()function rather than
deleteByMerging()
– A trace of this process is shown in Figure 6.33
– The numbers in the diagrams refer to the numbers in the code of
Figure 6.32
– This algorithm avoids the height increase problem of merging, but
problems can still result
– Since the algorithm always deletes the immediate predecessor of the
key being replaced, the left subtree can shrink while the right subtree
is unchanged, making the algorithm asymmetric
Data Structures and Algorithms in C++, Fourth Edition
59
Deletion (continued)
Figure 6.33 Deleting by copying
Data Structures and Algorithms in C++, Fourth Edition
60
Deletion (continued)
• Deletion by Copying (continued)
– Eventually the tree becomes unbalanced to the right, and the right
subtree is bushier and larger than the left
– A simple improvement can make this symmetric; we alternately delete
the node’s predecessor from the left subtree and its successor from
the right subtree
– This provides significant improvements; however the analysis has
proven to be extremely complex, and most studies focus on
simulations
Data Structures and Algorithms in C++, Fourth Edition
61
Balancing a Tree
• Two arguments have been presented in favor of trees:
– They represent hierarchical data particularly well
– Searching trees is much faster than searching lists
• However, this second point depends on the structure of the
tree
• As we’ve seen, skewed trees search no better than linear lists
• Three situations are presented in Figure 6.34
– A fairly well-balanced tree (Figure 6.34a)
– A right unbalanced tree (Figure 6.34b)
– A right skewed tree (Figure 6.34c)
• Neither of the last two situations occur in balanced trees
Data Structures and Algorithms in C++, Fourth Edition
62
Balancing a Tree (continued)
Fig. 6.34 Different binary search trees with the same information
• A binary tree is height balanced (or simply, balanced) if the
difference in height of the subtrees of any node in the tree is
zero or one
• It is perfectly balanced if it is balanced and all leaves are on
one or two levels
Data Structures and Algorithms in C++, Fourth Edition
63
Balancing a Tree (continued)
• The number of nodes that can be stored in binary trees of
different heights is shown in Figure 6.35
Fig. 6.35 Maximum number of nodes in binary trees of different heights
• From this we can see that if we store n elements in a perfectly
balanced tree, the height is lg  + 1
Data Structures and Algorithms in C++, Fourth Edition
64
Balancing a Tree (continued)
• So storing 10,000 items in such a tree gives us a height of
lg 10,001 = 13.288 = 14
• From the standpoint of searching, this means if 10000 items
are stored in a perfectly balanced tree, we’ll need to look at
14 items to find a particular one
• To find the item in an equivalent linked list would require
10,000 tests in the worst case
• So constructing a balanced tree, or modifying one to make it
balanced, is worth the effort
Data Structures and Algorithms in C++, Fourth Edition
65
Balancing a Tree (continued)
• There are many techniques for balancing a tree
– Some monitor the tree as items are added, and restructure the tree
when it becomes unbalanced
– Others reorder the data being processed and then build the tree, if the
reordering leads to the construction of a balanced tree
• We’ll first consider a process based on this latter technique
• In looking at Figure 6.34, notice that the structure of the trees
resulted from the order in which the data was placed
– In 6.34c, data was in ascending order, resulting in a right skewed tree
– In 6.34b, “B” arrived first, and since only “A” is less than “B”, all the
other nodes were placed in the right subtree
– In 6.34c, the root was near the middle of the list, resulting in a more
balanced tree
Data Structures and Algorithms in C++, Fourth Edition
66
Balancing a Tree (continued)
• This last arrangement suggests using an algorithm based on
the binary search technique to construct the tree
– The data is stored in an array as it arrives, then sorted
– The element in the middle of the array is designated as the root
– The elements in the middle of the left and right subarrays become the
roots of the left and right subtrees, etc.
– As this proceeds we build the tree one level at a time; first the root,
then its left and right children, etc.
– If we modify this to insert the root, then its left child, then the left
child’s child, etc., we can create a simple recursive algorithm
Data Structures and Algorithms in C++, Fourth Edition
67
Balancing a Tree (continued)
template <class T>
void BST<T>::balance(T data[], int first, int last) {
if (first <= last) {
int middle = (first = last)/2;
insert(data[middle]);
balance(data, first, middle-1);
balance(data, middle+1, last);
}
}
• An application of this algorithm is shown in Figure 6.36
Data Structures and Algorithms in C++, Fourth Edition
68
Balancing a Tree (continued)
Fig. 6.36 Creating a binary search tree from an ordered array
Data Structures and Algorithms in C++, Fourth Edition
69
Balancing a Tree (continued)
• This algorithm does suffer from one significant drawback
• All the data needs to be in the array before the tree can be
created
• So it can be unsuitable for a tree that needs to be used while
it is being constructed
• On the other hand, an unbalanced tree can be balanced easily
by carrying out an inorder traversal and writing the output to
an array
• This array can then be used to create a balanced version of
the tree
Data Structures and Algorithms in C++, Fourth Edition
70
Balancing a Tree (continued)
• The DSW Algorithm
– The previous algorithm is rather inefficient due to the need for an
auxiliary array to handle data storage during construction or
reorganization of the tree
– Thus it should only be used for fairly small trees
– There are other algorithms that use little extra storage and do not
require sorting, however
– One such algorithm was developed by Colin Day and modified by
Quentin F. Stout and Bette L. Warren; it is called the DSW algorithm
– Key to the behavior of this algorithm is the idea of rotation, introduced
by Adel’son-Vel’skii and Landis in 1962
– Two types of rotation can occur, left and right, which are symmetric
Data Structures and Algorithms in C++, Fourth Edition
71
Balancing a Tree (continued)
• The DSW Algorithm (continued)
– The right rotation of a node Ch around its parent Par is performed as
follows:
rotateRight(Gr, Par, Ch)
if Par is not the root of the tree //i.e.,if Gr is not null
grandparent Gr of child Ch becomes Ch’s parent;
right subtree of Ch becomes left subtree of Ch’s parent Par;
node Ch acquires Par as its right child;
– The steps of this are shown in Figure 6.37; note that the heart of this
process is the third step, when parent and child swap roles
– The first and second steps ensure that the tree remains a search tree
after the rotation is completed
Data Structures and Algorithms in C++, Fourth Edition
72
Balancing a Tree (continued)
• The DSW Algorithm (continued)
Fig. 6.37 Right rotation of child Ch about parent Par
– Essentially, the algorithm transforms an arbitrary binary search tree
into a linked list-like structure called a backbone or vine
– This is further transformed into a perfectly balanced tree by rotating
every second node of the backbone around its parent
Data Structures and Algorithms in C++, Fourth Edition
73
Balancing a Tree (continued)
• The DSW Algorithm (continued)
– The algorithm to create the backbone, which is the first step of the
process, is as follows:
createBackbone(root)
tmp = root;
while (tmp != 0)
if tmp has a left child
rotate this child about tmp; // hence the left child
// becomes parent of tmp
set tmp to the child that just became parent;
else set tmp to its right child;
– Figure 6.38 illustrates the operation of this algorithm; notice since
rotation requires knowing about the parent of tmp, another pointer
has to be used
Data Structures and Algorithms in C++, Fourth Edition
74
Balancing a Tree (continued)
Fig. 6.38 Transforming a binary search tree into a backbone
Data Structures and Algorithms in C++, Fourth Edition
75
Balancing a Tree (continued)
• The DSW Algorithm (continued)
– In the second step of the transformation, the backbone is
transformed into a perfectly balanced tree
– In each pass down the backbone, every second node is rotated about
its parent
– The first pass handles the difference between the number of nodes in
the backbone and the number of nodes in the closest complete binary
tree
– Overflowing nodes are treated separately
– An example of this is shown in Figure 6.39
– The backbone from Figure 6.38e is transformed in the first pass to the
backbone of Figure 6.39b; then two additional passes are executed
Data Structures and Algorithms in C++, Fourth Edition
76
Balancing a Tree (continued)
• The DSW Algorithm (continued)
Fig. 6.39 Transforming a backbone into a perfectly balanced tree
– In these diagrams, the nodes being promoted one level by left
rotations are shown as squares
– The circles are the parents about which they are rotated
Data Structures and Algorithms in C++, Fourth Edition
77
Balancing a Tree (continued)
• The DSW Algorithm (continued)
– The algorithm for this operation is shown following:
createPerfectTree()
n = number of nodes;
m =
;
make n-m rotations starting from the top of backbone;
while(m > 1)
m = m / 2;
make m rotations starting from the top of backbone;
Data Structures and Algorithms in C++, Fourth Edition
78
Balancing a Tree (continued)
• AVL Trees
– The balancing algorithms we’ve looked at so far can potentially involve
the entire tree
– However, rebalancing can be performed locally if the insertions or
deletions impact only a portion of the tree
– One well-known method is based on the work of Adel’son-Vel’skii and
Landis, and is named for them: the AVL tree
– An AVL tree (also called an admissible tree) is one where the height of
the left and right subtrees of every node differ by at most one
– Examples of AVL trees are shown in Figure 6.40
– Numbers in the nodes are the balance factors, which is the difference
between the height of the right and left subtrees and should be +1, 0,
or -1 for AVL trees
Data Structures and Algorithms in C++, Fourth Edition
79
Balancing a Tree (continued)
• AVL Trees (continued)
Fig. 6.40 Examples of AVL trees
– Notice that while the definition of an AVL tree is the same as that of a
balanced tree, the model implicitly includes the balancing techniques
– In addition, the process of balancing an AVL tree does not guarantee a
perfectly balanced tree
– The minimum number of nodes in an AVL tree is determined by:
AVLh = AVLh-1 + AVLh-2 + 1
Data Structures and Algorithms in C++, Fourth Edition
80
Balancing a Tree (continued)
• AVL Trees (continued)
– In this recurrence relation, the initial values are AVL0 = 0 and AVL1 = 1
– From this, we can derive the bounds on the height (h) of the AVL tree
based on the number of nodes (n):
lg(n + 1) < h < 1.44 lg(n + 2) – 0.328
– Recall that for a perfectly balanced tree, h = lg n + 1
– If any node in an AVL tree has its balance factor become less than -1 or
greater than 1, it has to be balanced
– There are four situations in which a tree can become unbalanced; two
are symmetrical to the other two, so we only need to look at two cases
– These two cases are illustrated in Figure 6.41 and 6.42
Data Structures and Algorithms in C++, Fourth Edition
81
Balancing a Tree (continued)
• AVL Trees (continued)
– The first case, shown in Figure 6.41, occurs when a node is inserted in
the right subtree of the right child
Fig. 6.41 Balancing a tree after insertion of a node in the right subtree of node Q
– The subtrees involved in the rotation have their heights indicated
– After a new node is inserted somewhere in the right subtree of Q to
unbalance the tree, Q rotates around is parent P to rebalance the tree
– This is illustrated in Figure 6.41b and Figure 6.41c
Data Structures and Algorithms in C++, Fourth Edition
82
Balancing a Tree (continued)
• AVL Trees (continued)
– The second case, shown in Figure 6.42, occurs when a node is inserted
in the left subtree of the right child, and is more complicated
Fig. 6.42 Balancing a tree after insertion of a node in the left subtree of node Q
– A node is inserted in Figure 6.42a, resulting in the tree in Figure 6.42b
– The derail of this is in Figure 6.42c; note the subtrees of R could be
reversed, giving R a value of -1
– The imbalance is solved by a double rotation: first R around Q (Figure
6.42d), then R around P (Figure 6.42e)
Data Structures and Algorithms in C++, Fourth Edition
83
Balancing a Tree (continued)
• AVL Trees (continued)
– These examples treat P as a stand-alone tree; however it could be part
of a larger tree, perhaps a child of another node
– As Figures 6.41 and 6.42 imply, the changes made to the subtree
originally rooted at P are sufficient to restore balance to the tree
– The problem then is to find the node P in the tree that becomes
imbalanced after an insertion
– This can be accomplished by moving back up the tree from the point
of insertion, updating the balance factors until one becomes ± 2
– This node then becomes P, the root of the subtree that needs to be
rebalanced
– The algorithm to accomplish the balance updates is shown as
pseudocode on pages 258 and 259
Data Structures and Algorithms in C++, Fourth Edition
84
Balancing a Tree (continued)
• AVL Trees (continued)
– Figure 6.43 illustrates what happens when a node is inserted and the
resulting updates cause a node to become imbalanced
Figure 6.43 An example of inserting (b) a new node in (a) an AVL tree,
which requires one rotation (c) to restore the height balance
– It is also possible that an insertion will only cause the balance factors
to be updated; in this case the path is followed back to the root
– This is shown in Figure 6.44
Data Structures and Algorithms in C++, Fourth Edition
85
Balancing a Tree (continued)
• AVL Trees (continued)
Fig. 6.44 In an (a) AVL tree a (b) new node is inserted requiring no height adjustments
– Deleting a node may require more work
– The deleteByCopying()algorithm is applied to delete the node
– The balance factors of all nodes from the parent of the deleted node
to the root are then updated
– Each node whose balance factor becomes ± 2 will be rebalanced
Data Structures and Algorithms in C++, Fourth Edition
86
Balancing a Tree (continued)
• AVL Trees (continued)
– This must be done for the entire path because deletion may not cause
an imbalance in a parent, but in a grandparent
– We’ll only consider those cases (four, with four symmetric cases)
which necessitate immediate rotation
– In each case, the assumption is the left child of the node P is deleted
– These cases are illustrated in Figure 6.45 on page 261
– In the first case, P’s balance factor is +1, and its right child, Q, is also at
+1; this is shown in Figure 6.45a
– Deleting a node from P’s left child leads to the tree in Figure 6.45b,
with P at +2; this is rebalanced by rotating P around Q (Figure 6.45c)
Data Structures and Algorithms in C++, Fourth Edition
87
Balancing a Tree (continued)
• AVL Trees (continued)
– In the second case, P’s balance factor is +1, and its right child, Q, is 0;
this is shown in Figure 6.45d
– Deleting a node from P’s left child leads to the tree in Figure 6.45e,
with P at +2; this is rebalanced as in the first case by rotating P around
Q (Figure 6.45f)
– So the first two cases, with Q either +1 or 0, can be handled in the
same way
– If the initial balance factor of Q is -1, which is true in the other two
cases, the process of rebalancing is more complex
– The third case arises when the left subtree of Q, rooted at R, has a
balance factor of -1 (Figure 6.45g)
– Rebalancing after deletion requires a double rotation; first of R about
Q and then of R about P (Figures 6.45h and 6.45i)
Data Structures and Algorithms in C++, Fourth Edition
88
Balancing a Tree (continued)
• AVL Trees (continued)
– The fourth case is similar to the third, but differs in that the balance
factor of R is +1, rather than -1
– However, rebalancing can be done with the same two rotations as the
third case so the third and fourth can be handled together (Figures
6.45j – l)
Data Structures and Algorithms in C++, Fourth Edition
89
Self-Adjusting Trees
• The focus of our efforts in balancing trees has been to keep
the trees properly structured
• Consequently, whenever a newly inserted node threatens a
tree’s balance, action is taken to correct the imbalance
– This can be done for the entire tree, using the DSW technique, or
locally, using the AVL process
• Is correcting the imbalance always necessary?
• Since trees are used to handle items quickly, it is the speed of
operations and not the tree’s structure that is critical
Data Structures and Algorithms in C++, Fourth Edition
90
Self-Adjusting Trees (continued)
• Consider the idea that not all items in a tree are likely to be
used with equal frequency
• If we keep track of the most frequently accessed items, and
structure the tree to improve access to them, we can improve
performance
• This is the basis for self-adjusting trees
• The strategy is to migrate up the tree those elements used
most often, creating a “priority tree”
Data Structures and Algorithms in C++, Fourth Edition
91
Self-Adjusting Trees (continued)
• We can keep track of frequency in a number of ways
• Each node could have a counter to record accesses; the tree
could be rearranged to move highest counts up in the tree
• A second approach is based on the assumption that an item
that has been accessed will be accessed soon again
– Each time an element is accessed, it is moved up the tree
– New elements are simply added where appropriate without
restructuring
– Although this could promote infrequently accessed objects, over a
period of use the more frequently used items will occupy the higher
levels of the tree
Data Structures and Algorithms in C++, Fourth Edition
92
Self-Adjusting Trees (continued)
• Self-Restructuring Trees
– Brian Allen, Ian Munroe, and James Bitner proposed a strategy with
two possibilities, shown in Figure 6.46
• Single rotation – if an element in a child is accessed, rotate the
child around the parent, unless it is the root (Figure 6.46a)
• Moving to the root – the parent-child rotation is repeated until
the element that was accessed is the root (Figure 6.46b)
Fig. 6.46 Restructuring a tree by (a) using a single rotation
or (b) moving to the root when accessing node R
Data Structures and Algorithms in C++, Fourth Edition
93
Self-Adjusting Trees (continued)
• Self-Restructuring Trees (continued)
– In the single-rotation approach, the more often an item is accessed
the closer it moves to the root, so access to the element improves
– With move-to-the-root, the assumption is the accessed element will
be accessed soon again, so it immediately moves to the root
– Even if the item isn’t used right away, it will remain close to the root
for further access
– Unfortunately, these strategies don’t work well in the case of skewed
trees such as we’ve seen earlier; although they will improve slowly
– This situation is displayed in Figure 6.47
Data Structures and Algorithms in C++, Fourth Edition
94
Self-Adjusting Trees (continued)
• Self-Restructuring Trees (continued)
Fig. 6.47 (a–e) Moving element T to the root
and then (e–i) moving element S to the root
Data Structures and Algorithms in C++, Fourth Edition
95
Self-Adjusting Trees (continued)
• Splaying
– Robert Tarjan and Daniel Sleator developed an alternative to the
move-to-the-root technique in 1985
– Called splaying, it applies single rotations in pairs; using an order
determined by the links between child, parent, and grandparent
– First we distinguish from among three cases, based on the relationship
between a node R, its parent Q, and grandparent P (if they exist)
• Case 1 – The root node is R’s parent
• Case 2 – Called the homogeneous configuration, R is the left child
of Q, and Q is the left child of P (or, R and Q are right children)
• Case 3 – Called the heterogeneous configuration, R is the right
child of Q and Q is the left child of P (or R is the left child of Q and
Q is the right child of P)
Data Structures and Algorithms in C++, Fourth Edition
96
Self-Adjusting Trees (continued)
• Splaying (continued)
– The algorithm to manipulate the node in the tree is as follows:
Splaying (P,Q,R)
while R is not the root
if R’s parent is the root
perform a singular splay, rotate R about its parent; (Figure 6.48a)
else if R is in a homogeneous configuration with its predecessors
perform a homogeneous splay, first rotate Q about P
and then R about Q; (Figure 6.48b)
else // R is in a heterogeneous configuration with its predecessors
perform a heterogeneous splay, first rotate R about Q
and then about P; (Figure 6.48c)
Data Structures and Algorithms in C++, Fourth Edition
97
Self-Adjusting Trees (continued)
• Splaying (continued)
Fig. 6.48 Examples of splaying
Data Structures and Algorithms in C++, Fourth Edition
98
Self-Adjusting Trees (continued)
• Splaying (continued)
– The difference in applying this technique to the tree of Figure 6.47a is
shown on Figure 6.49
– In accessing the node T in the fifth level of the tree, the shape is
improved a great deal; after accessing R, the improvement is dramatic
Fig. 6.49 Restructuring a tree with splaying (a–c) after accessing T and (c–d) then R
Data Structures and Algorithms in C++, Fourth Edition
99
Self-Adjusting Trees (continued)
• Splaying (continued)
– Even though splaying is a combination of two rotations (except when
next to the root), they are not necessarily bottom-up
– For homogeneous cases (left-right or right-right), the parent and
grandparent node are rotated, then the node and its parent
– The effect of this is both to move the node towards the root and to
flatten the tree, which improves access
– Splaying focuses on the elements rather than tree shape, so typically it
performs better when some elements are used more frequently
– If all the elements are accessed with about the same frequency, it will
not be as useful
– In those cases an alternative approach that focuses on balancing the
tree is better
Data Structures and Algorithms in C++, Fourth Edition
100
Self-Adjusting Trees (continued)
• Splaying (continued)
– A modification called semisplaying is illustrated in Figure 6.48b
– It requires one rotation for a homogeneous splay, and continue
splaying with the parent of the node, rather than the node itself
– Figure 6.50 illustrates the application of semisplaying
– The tree of Figure 6.49a becomes more balanced after accessing node
T using this technique (Figures 6.50a – c)
– After T is accessed a second time, the resulting tree (Figure 6.50d) is
structurally similar to Figure 6.46a
– Although theoretical results are favorable, empirical results for various
trees show that AVL tress work better than self-modifying ones; many
times even a regular binary tree outperforms these
Data Structures and Algorithms in C++, Fourth Edition
101
Self-Adjusting Trees (continued)
• Splaying (continued)
Fig. 6.50 (a–c) Accessing T and restructuring the tree with semisplaying; (c–d) accessing T again
Data Structures and Algorithms in C++, Fourth Edition
102
Heaps
• A heap is a special type of binary tree with the following
properties:
– The value of each node is greater than or equal to the values stored in
its children
– The tree is perfectly balanced, and the leaves in the last level are
leftmost in the tree
• This actually defines a max heap; if “greater than” is replaced
by “less than” in the first property, we have a min heap
• Thus the root of a max heap is the largest element, and the
root of a min heap the smallest
• If each nonleaf of a tree exhibits the first property, the tree
exhibits the heap property
Data Structures and Algorithms in C++, Fourth Edition
103
Heaps (continued)
• Figure 6.51 exhibits some examples; those in Figure 6.51a are
heaps, while those in Figure 6.51b violate the first property
and those in Figure 6.51c violate the second
Fig. 6.51 Examples of (a) heaps and (b–c) nonheaps
Data Structures and Algorithms in C++, Fourth Edition
104
Heaps (continued)
• Heaps can be implemented as arrays
• As an example, consider the array data=[2 8 6 1 10 15 3
12 11] represented as a nonheap tree in Figure 6.52
Fig. 6.52 The array [2 8 6 1 10 15 3 12 11]
seen as a tree
• The arrangement of the elements reflects the tree from top-tobottom and left-to-right
Data Structures and Algorithms in C++, Fourth Edition
105
Heaps (continued)
• We can define a heap as an array heap of length n where
heap[i] > heap[2i + 1], for 0 < i < (n – 1)/2
and
heap[i] > heap[2i + 2], for 0 < i < (n – 2)/2
• Elements in a heap are not ordered; we only know the root is
the largest and the descendants are less than or equal to it
• The relationship between siblings or between elements in
adjacent subtrees is undetermined
• All we are aware of is that there is a linear relationship along
the lines of descent, but lateral lines are ignored
Data Structures and Algorithms in C++, Fourth Edition
106
Heaps (continued)
• This is why, although all the trees in Figure 6.53 are heaps,
Figure 6.53b is ordered the best
Fig. 6.53 Different heaps constructed with the same elements
Data Structures and Algorithms in C++, Fourth Edition
107
Heaps (continued)
• Heaps as Priority Queues
– Heaps are ideal for implementing priority queues
– We saw linked lists used to do this in section 4.3, but for large
amounts of data, they can become inefficient
– Because heaps are perfectly balanced trees, the inherent efficiency of
searching such structures makes them more useful
– We will need a couple of routines to enqueue and dequeue elements
on the priority queue, though
– To enqueue, the node is added at the end of the heap as the last leaf
– If the heap needs to be restructured to preserve the heap property, it
can be done by moving the node from last leaf towards the root
Data Structures and Algorithms in C++, Fourth Edition
108
Heaps (continued)
• Heaps as Priority Queues (continued)
– The enqueuing algorithm is as follows:
heapEnqueue(el)
put el at the end of the heap;
while el is not in the root and el > parent(el)
swap el with its parent;
– This is illustrated in Figure 6.54a, where the node 15 is added to the
heap
– Because this destroys the heap property, 15 is moved up the tree until
it is either the root or finds a parent greater than or equal to 15
– This is reflected in Figure 6.54b-d
Data Structures and Algorithms in C++, Fourth Edition
109
Heaps (continued)
• Heaps as Priority Queues (continued)
Fig. 6.54 Enqueuing an element to a heap
Data Structures and Algorithms in C++, Fourth Edition
110
Heaps (continued)
• Heaps as Priority Queues (continued)
– Dequeuing an element from a heap simply removes the root (since it
is the largest value) and replacing it by the last leaf
– Since this will most likely violate the heap property, the node is moved
down the tree to the appropriate location
– The algorithm for this looks like:
heapDequeue()
extract the element from the root;
put the element from the last leaf in its place;
remove the last leaf;
// both subtrees of the root are heaps
p = the root;
while p is not a leaf and p < any of its children
swap p with the larger child;
Data Structures and Algorithms in C++, Fourth Edition
111
Heaps (continued)
• Heaps as Priority Queues (continued)
– This is shown in Figure 6.55; 20 is dequeued and 6 put in its place
– This is then swapped with 15 (the larger child) and again with 14
Fig. 6.55 Dequeuing an element from a heap
Data Structures and Algorithms in C++, Fourth Edition
112
Heaps (continued)
• Heaps as Priority Queues (continued)
– The last three lines of this dequeuing algorithm can be used as a
stand-alone routine to restore the heap property if it is violated by the
root by moving it down the tree; a coded form is shown below:
Fig. 6.56 Implementation of an algorithm to move the root element down a tree
Data Structures and Algorithms in C++, Fourth Edition
113
Heaps (continued)
• Organizing Arrays as Heaps
– As we’ve seen, heaps can be implemented as arrays, but not all arrays
are heaps
– In some circumstances, though, we need to organize the contents of
an array as a heap, such as in the heap sort
– One of the simpler ways to accomplish this is attributed to John
Williams; we start with an empty heap and sequentially add elements
– This is a top-down technique that extends the heap by enqueuing new
elements in the heap
– This process is described on page 273 and illustrated in Figure 6.57
Data Structures and Algorithms in C++, Fourth Edition
114
Organizing arrays as heaps
Data Structures and Algorithms in C++, Fourth Edition
115
Heaps (continued)
• Organizing Arrays as Heaps (continued)
– A bottom-up approach that starts by forming small heaps and merging
them into larger heaps was proposed by Robert Floyd
– The algorithm follows:
FloydAlgorithm(data[])
for i = index of the last nonleaf down to 0
restore the heap property for the tree whose root is
data[i] by calling moveDown(data, i, n-1);
– Figure 6.58 (page 275) shows an example of transforming the array
data[] = [2 8 6 1 10 15 3 12 11] into a heap
– The process is described in detail on pages 273 - 276
Data Structures and Algorithms in C++, Fourth Edition
116
Organizing
arrays as
heaps
Data Structures and Algorithms in C++, Fourth Edition
117
Treaps
• Heaps suffer from one significant problem
• While they allow rapid access to the largest (or smallest)
element in the heap, accessing any other element is awkward
• As we’ve seen binary search trees are ideal for searching, but
can lose this efficiency if they aren’t kept balanced
• To take advantage of the strengths of both of these data
structures, we can create a treap
• A treap is a binary search tree that associates priorities with
elements and organizes itself into a heap according to these
priorities
• The heap in this case preserves only the heap property
Data Structures and Algorithms in C++, Fourth Edition
118
Treaps (continued)
• An example of this is shown in Figure 6.59 (page 278), where
the binary search tree (6.59a) is combined with a max-heap
(Figure 6.59b) to form the treap (6.59c)
• When an item is inserted, a priority is generated, and then the
element is added as a leaf
• If the new node’s priority is larger than its parent, it is rotated
around the parent; this is repeated until the parent’s priority
is larger (or the new node becomes the root)
• Figures 6.59d-j shows this process first for the node G with a
priority of 17, and then for node J with priority 25
Data Structures and Algorithms in C++, Fourth Edition
119
Treaps (continued)
• Deleting a node from a treap requires rotating the node with
the child of higher priority, and continuing this until only one
child is left or the node becomes a leaf
• This is illustrated in Figure 6.59j-l, where node J is deleted by
first rotating it with M, and then deleting it
• It is also possible to process treaps without storing the
priorities in the nodes
• One approach uses a hash function, h, on the key value, K, to
generate the priority h(K)
• Another approach uses an array to store the nodes
Data Structures and Algorithms in C++, Fourth Edition
120
Treaps (continued)
• A treap that behaves like a min-heap stores only the index of
the location the element occupies, which serves as its priority
• To insert an item, an index i is generated, subject to i < n (the
number of elements in the treap and array)
• If i = n, the element is placed in the nth location in the array
and inserted into the treap
• Otherwise the element in position i is moved to position n
(with priority n) through rotations and the new element is
inserted in that position with priority i
• To delete an item, it is first deleted from the treap, and then
removed from the array
• The item in position n is then rotated to this position (upward
in the treap)
Data Structures and Algorithms in C++, Fourth Edition
121
K-d Trees
• In the binary search trees we’ve looked at, a single key has
been used to perform all operations in the tree
• It is possible to use multiple keys and still retain the pure
binary tree form, however
• One approach, developed by Jon Bentley in 1975, is the
multidimensional (k-dimensional) binary search tree or K-d
tree
• The multidimensionality refers to the items stored in the tree
and not the tree itself
• These structures are often used in multi-key searches such as
range searches and nearest neighbor searches
Data Structures and Algorithms in C++, Fourth Edition
122
K-d Trees (continued)
• If we consider points in a Cartesian plane, each point is
characterized by two values, the x and y coordinates
• If we used a binary tree to store the points, we could use
either coordinate as a key to determine the location to insert
the point, or a concatenation of both
• To use both keys separately in descending binary tree, the x
coordinate can be used on odd levels and the y on even levels
• The structure of this tree is illustrated by the example shown
on the left in Figure 6.61
• “A” is the root, so a vertical line corresponding to its x
coordinate is drawn through it
• Nodes to the left are in the left subtree, and those to the right
are in the right subtree
Data Structures and Algorithms in C++, Fourth Edition
123
K-d Trees (continued)
• On the next level, the y coordinate is used, so horizontal lines
are drawn through “G” and “B”
• Points below are in the left subtree of the node, and points
above are in the right subtree
• This continues until all nodes are processed
• The tree corresponding to this partitioning is shown on the
right of Figure 6.61
• The values for the coordinates are based on a 100 x 100
square; generally there is no limitation in size and the points
can be anywhere on the plane
Data Structures and Algorithms in C++, Fourth Edition
124
K-d Trees (continued)
Fig. 6.61 An example of a 2-d tree
• We can generalize a k-d tree to store any number of keys of
any type
• For example, the database table on the left of Figure 6.62 can
be represented using the 3-d tree on the right
• The name is used on level 1, YOB on level 2 and salary on level
3; name would be used again on level 4, etc.
Data Structures and Algorithms in C++, Fourth Edition
125
K-d Trees (continued)
• A pseudocode algorithm for insertion is shown on page 281
• To search for a particular item in a k-d tree (an exact match),
we use the same technique as in a binary search
• The major difference is that at each level, we have to use the
appropriate key, and all keys have to match
• We can also use the k-d tree to output items in a particular
range (called a region query)
• For a given node, we first check that the element is within a
specified region
• We then check the element’s key, and if it is in the range
specified for the key, we continue with both children
Data Structures and Algorithms in C++, Fourth Edition
126
K-d Trees (continued)
• If the key is above the lower bound, but not smaller then the
upper bound, we only check the left subtree
• Likewise, if the key is below the upper bound, but not larger
than the lower bound, we check the right subtree
• Pseudocode for this procedure is shown on page 282
• Deletion of a node in a k-d tree is more complicated than a
standard binary tree due to the placement of predecessors
and successors
• In Figure 6.61, for example the immediate predecessor of “A”
is “H”, not “G” as would be determined in a binary algorithm
• The problem is that at the level “G” is at, y keys are used, so
there may be nodes in the left subtree with larger x values
Data Structures and Algorithms in C++, Fourth Edition
127
K-d Trees (continued)
• So to find a node with a smaller x coordinate, we have to
investigate both subtrees
• In general, if an immediate successor with respect to a given
key is to be found, then if we are at a node on a level where
that key is used, we can examine the right subtree
• Otherwise we will need to examine both subtrees
• If the node, p, doesn’t have a right subtree, then we examine
the left subtree to find the smallest node, q
• Information from q is copied over the information in p, the
left subtree of p becomes the right subtree of p, and then q is
deleted
• This is illustrated in Figure 6.63 on page 284
Data Structures and Algorithms in C++, Fourth Edition
128
K-d Trees (continued)
• A pseudocode algorithm for the deletion process is shown on
page 285
• Notice that when deleting the root, on levels using the x
coordinate (except the root level) only left subtrees need be
looked at
• This is shown in Figure 6.63a and 6.63c for the value (20, 40);
the right subtree is not searched
• In general, in a k-d tree only for nodes on every kth level are
their right subtrees not searched
Data Structures and Algorithms in C++, Fourth Edition
129
Polish Notation and Expression Trees
• One of the more significant applications of binary trees is the
explicit representation of arithmetic, relational, and logical
expressions
• Polish notation, developed by Jan Lukasiewicz in the 1920s, is
a special notation for symbolic logic that allows us to
eliminate all parentheses from formulas
• While the resulting formulas were less readable, with the
advent of computers the technique proved to be very useful
• For the sake of readability and avoiding ambiguity, we write
formulas with extra symbols like parentheses
Data Structures and Algorithms in C++, Fourth Edition
130
Polish Notation and Expression Trees
(continued)
• However, if we are only concerned with ambiguity, as in a
compiler, we can omit the extra symbols
• This requires we rearrange the symbols used in the formulas
• So how does Polish notation work?
• If we consider an expression like 2 – 3 ∙ 4 + 5, the results will
depend on the order the operations are executed
• Based on the traditional hierarchy, multiplication is done first,
then addition and subtraction from left to right, yielding -5
• However, doing the addition and subtraction before
multiplication yields -9; multiplying and adding before
subtraction gives us -15
Data Structures and Algorithms in C++, Fourth Edition
131
Polish Notation and Expression Trees
(continued)
• A computer doesn’t know what the default order of
operations is, and unlike humans, can’t use parentheses to
override default behavior
• So all expressions a compiler encounters need to be broken
down unambiguously and put into proper order
• This is where Polish notation comes in handy; using it we can
create an expression tree that defines the order of operations
• Using this, the three expressions we’ve seen can be
represented by the three trees in Figure 6.64
• Notice that with the trees, there is no ambiguity; the final
result can only be determined from the intermediate results
Data Structures and Algorithms in C++, Fourth Edition
132
Polish Notation and Expression Trees
(continued)
Fig. 6.64 Examples of three expression trees and results of their traversals
Data Structures and Algorithms in C++, Fourth Edition
133
Polish Notation and Expression Trees
(continued)
• Notice no parentheses are used, but there is no ambiguity
• This structure can be retained even when the tree is linearized
by traversing the tree and output the symbols based on the
traversal order
• The three relevant traversals are preorder, inorder, and
postorder, which are shown in Figure 6.64
• Notice that traversing each tree using inorder produces the
same expression, implying inorder isn’t useful to us
• The other two are however, so we can use them to create
unambiguous expressions
Data Structures and Algorithms in C++, Fourth Edition
134
Polish Notation and Expression Trees
(continued)
• Because of the usefulness and importance of these traversals,
the results they produce are given names to distinguish them
• Preorder traversals produce prefix notation, where the
operator precedes the operands it works on, such as in LISP
• Postorder traversals generate postfix notation, where the
operator follows the operands it works on, such as in Forth
• Inorder traversals create infix notation, where the operator is
in between its operands
• It is the infix notation form that we are most familiar with in
reading and creating expressions
Data Structures and Algorithms in C++, Fourth Edition
135
Polish Notation and Expression Trees
(continued)
• Operations on Expression Trees
– As we’ve already seen, binary trees can be created top-down or
bottom-up
– We seen the first technique applied in dealing with tree insertion; the
second approach will be used to create expression trees while
scanning infix expressions left to right
– The critical part of this is retaining the precedence of operators, as
was shown in Figure 6.64
– This is simple if parentheses are not allowed, so the algorithm needs
to be able to handle an arbitrary depth of nesting in expressions
– An ideal method for doing this is through recursion; the interpreter
from the case study in chapter 5 will be modified to create a tree
constructor
Data Structures and Algorithms in C++, Fourth Edition
136
Polish Notation and Expression Trees
(continued)
• Operations on Expression Trees (continued)
– As the trees in Figure 6.64 show, each node is either an operator or an
operand; these can all be represented as strings using the class:
class ExprTreeNode {
public:
ExprTreeNode (char *k, ExprTreeNode *l, ExprTreeNode *r) {
key = new char[strlen(k) +1];
strcpy (key, k);
left = l; right = r;
}
…
private:
char *key;
ExprTreeNode *left, *right;
}
Data Structures and Algorithms in C++, Fourth Edition
137
Polish Notation and Expression Trees
(continued)
• Operations on Expression Trees (continued)
– The expressions that are converted to trees use the same syntax as the
expressions in the chapter 5 case study, so the same syntax diagrams
can be used
– Based on those diagrams, a class ExprTree can be defined in which
methods for factors and terms can be defined using the pseudocode
shown on page 288
– It turns out that the tree form of expressions is ideal for generating
assembly code in compilers; the pseudocode from the ExprTree
class to do this is shown on page 289
– Using this, the expression (var2 + n) * (var2 + var1)/5 becomes the
tree in Figure 6.65, and the generatecode() function creates the
intermediate code following the tree
Data Structures and Algorithms in C++, Fourth Edition
138
Polish Notation and Expression Trees
(continued)
• Operations on Expression Trees (continued)
Figure 6.65 An expression tree
Data Structures and Algorithms in C++, Fourth Edition
139
Polish Notation and Expression Trees
(continued)
• Operations on Expression Trees (continued)
– Expression trees can be used for a number of other symbolic
manipulations, such as differentiation
– Rules for differentiation are illustrated in Figure 6.66 and the
pseudocode that follows it on page 290
Data Structures and Algorithms in C++, Fourth Edition
140

similar documents