Report

2-3-4 Trees Chapter 10 – Part 1 2-3-4 Trees and Related Topics 1 Introduction • Multi-way Trees are trees that can have up to four children and three data items per node. • 2-3-4 Trees: Very nice features – Are always balanced. (What does balanced mean??) – Reasonably easy to program . – Serve as an introduction to the understanding of B-Trees!! • B-Trees: another kind of multi-way tree particularly useful in organizing external storage, like files. – B-Trees (later lectures) can have dozens or hundreds of children with hundreds of thousands of records! 2 Introduction to 2-3-4 Trees • Shape of nodes is a lozenge-shaped. • In a 2-3-4 tree, all leaf nodes are at the same level. (but data can appear in all nodes) 50 30 10 20 40 60 55 62 64 66 70 80 75 83 86 3 • The 2, 3, and 4 in the name refer to how many links to child nodes can potentially be contained in a given node. • • For non-leaf nodes, three arrangements are possible: – A node with only one data item always has two children – A node with two data items always has three children – A node with three data items always has four children. • For non-leaf nodes with at least one data item ( a node will not exist with zero data items), the number of links may be 2, 3, or 4. 4 • Non-leaf nodes must/will always have one more child (link) than it has data items (see below); – Equivalently, if the number of child links is L and the number of data items is D, then L = D+1. 50 30 10 20 40 60 55 62 64 66 70 80 75 83 86 5 More Introductory stuff 50 30 10 20 40 60 55 62 64 66 70 80 75 83 86 • Critical relationships determine the structure of 2-3-4 trees: • A leaf node has no children, but can still contain one, two, or three data items ( 2, 3, or 4 links); cannot be empty. (See figure above) • Because a 2-3-4 tree can have nodes with up to four children, it’s called a multiway tree of order 4. 6 Still More Introductory stuff • Binary (and Red Black) trees may be referred to as multiway trees of order 2 - each node can have up to two children. • But note: in a binary tree, a node may have up to two child links ( but one or more may be null). • In a 2-3-4 tree, nodes with a single link are NOT permitted; – a node with one data item must have two links (unless it’s a leaf); – nodes with two data items must have three children; – nodes with three data items must have four children. • (You will see this more clearly once we talk about how they are actually built.) 7 Even More Introductory stuff • These numbers are important. • For data items, a node with one data item: points to (links to) lower level nodes that have values less than the value of this item and a pointer to a node that has values greater than or equal to this value. • For nodes with two links: a node with two links is called a 2-node; a node with three links is called a 3-node; with four links, a 4-node. (no such thing as a 1-node). 8 50 30 10 20 40 2-node 2-node 55 60 62 64 66 70 4-node 80 75 83 86 Do you see any 2-nodes? 3-nodes? 4-nodes? Do you see: a node with one data item that has two links? a node with two data items having three children; a node with three data items having four children? 9 2-3-4 Tree Organization • Very different organization than for a binary tree. • First, we number the data items in a node 0,1,2 and number child links: 0,1,2,3. Very Important. • Data items are always ascending: left to right in a node. • Relationships between data items and child links is easy to understand but critical for processing. 10 More on 2-3-4 Tree Organization A B Points to nodes w/keys < A ; Nodes with key between A and <B C Nodes w/keys between B and < C Nodes w/keys > C See below: (Equal keys not permitted; leaves all on same level; upper level nodes often not full; tree balanced! Its construction always maintains its balance, even if you add additional data items. (ahead) 50 30 10 20 40 2-node 2-node 55 60 62 64 66 70 80 75 4-node 83 86 11 Searching a 2-3-4 Tree • • • • A very nice feature of these trees. You have a search key; Go to root. Retrieve node; search data items; If hit: – done. • Else – Select the link that leads to the appropriate subtree with the appropriate range of values. – If you don’t find your target here, go to next child. (notice data items are sequential – VIP later) – etc. Data will ultimately be ‘found’ or ‘not found.’ 12 Try it: search for 64, 40, 65 50 30 10 20 40 2-node 2-node 55 60 62 64 66 70 80 75 4-node 83 86 Note: Nodes serve as holders of data and holders of ‘indexes’. Note: can easily have a ‘no hit’ condition Note: the sequential nature after indexing…sequential searching within node. 13 So, how do we Insert into this Structure? • Can be quite easy; sometimes very complex. – Can do a top-down or a bottom-up approach… • Easy Approach: – Start with searching to find a spot for data item. • We like to insert at the leaf level, but we will take the top-down approach to get there… So, – Inserting may very likely involve moving a data item around to maintain the sequential nature of the data in a leaf. 14 Node Split – a bit more difficult (1 of 2) • Using a top-down 2-3-4 tree. • If we encounter a full node in looking for the insertion point. – We must split the full nodes. • You will see that this approach keeps the tree balanced. 15 Node Split – Insertion: more difficult – 2 of 2 Upon encountering a full node (searching for a place to insert…) 1. split that node at that time. 2. move highest data item from the current (full) node into new node to the right. 3. move middle value of node undergoing the split up to parent node (Know we can do all this because parent node was not full) 4. Retain lowest item in node. 5. New node (to the right) only has one data item (the highest value) 6. Original node (formerly full) node contains only the lowest of the three values. 7. Rightmost children of original full node are disconnected and connected to new children as appropriate (They must be disconnected, since their parent data is changed) New connections conform to linkage conventions, as expected. 8. Insert new data item into the original leaf node. Note: there can be multiple splits encountered en route to finding the insertion point. 16 Insert: Here: Split is NOT the root node. Let’s say we want to add a 99 (from book)… 62 Want to add a data value of 99 Split this node… … other stuff 83 74 87 89 92 97 99 to be inserted… 104 112 17 Case 1 Insert: Split is NOT the root node (Let’s say we want to add a 99 (from book)…) 2. 92 moves up to parent node. (We know it was not full) 62 92 3. 83 stays put … other stuff 83 74 87 89 1. 104 starts a new node 104 97 99 112 4. Two rightmost children of split node are reconnected to new node. 5. New data item moved in. 18 If Root itself is full: Split the Root • Here, the procedure is the same. • Root is full. Create a sibling – Highest value data is moved into new sibling; – first (smallest value) remains in node; – middle value moves up and becomes data value in new root. • Here, two nodes are created: – A new sibling and a new root. 19 Splitting on the Way Down • Note: once we hit a node that must be split (on the way down), we know that when we move a data value ‘up’ that ‘that’ node was not full. – May be full ‘now,’ but it wasn’t on way down. • Algorithm is reasonably straightforward. • Do practice the splits on Figure 10.7. – Will see later on next exam. • I strongly recommend working the node splits on page 381. Ensure you understand how they work. • Just remember: – 1. You are splitting a 4-node. Node being split has three data values. Data on the right goes to a new node. Data on the left remains; data in middle is promoted upward; new data item is inserted appropriately. – 2. We do a node split any time we encounter a full node and when we are trying to insert a new data value. 20 Objects of this Class Represent Data Items Actually Stored. // tree234.java import java.io.*; //////////////////////////////////////////////////////////////// class DataItem { This is merely A data item stored at the nodes. In practice, this might be an entire record or object. Here we are only showing the key, where the key may represent the entire object. public int dData; // one data item //-------------------------------------------------------------public DataItem(int dd) // constructor { dData = dd; }// end constructor //-------------------------------------------------------------public void displayItem() // display item, format "/27" { System.out.print("/"+dData); } // end displayItem() //-------------------------------------------------------------} // end class DataItem 21 Following code is for processing that goes on inside the Nodes themselves – not the entire tree. Not trivial code. Try to understand totally. We will dissect carefully and deliberately! 22 This is what a Node looks like: Note: two arrays: a child array and an item array. private static final int ORDER = 4; Note their size: nodes = 4; item = 3. private int numItems; 4 The child array is size 4: the links: maximum children. private Node parent; 3 private Node childArray[] = new Node[ORDER]; The second array, itemArray is of size 3 – the private DataItem itemArray[] = new DataItem[ORDER-1]; maximum number of data items in a node. // ------------------------------------------------------------numItems is the number of items in the itemArray. parent will be used as a reference when inserting. class Node { public void connectChild(int childNum, Node child) {// connect child to this node childArray[childNum] = child; if(child != null) child.parent = this; } // ------------------------------------------------------------public Node disconnectChild(int childNum) {// disconnect child from this node, return it Node tempNode = childArray[childNum]; childArray[childNum] = null; return tempNode; Major work done by findItem(), insertItem() and } removeItem() (next slides) for a given node. // ------------------------------------------------------------public Node getChild(int childNum) These are complex routines and NOT to be confused { return childArray[childNum]; } with find() and insert() for the Tree234 class itself. // ------------------------------------------------------------These are find() and insert() within THIS node. public Node getParent() { return parent; } // ------------------------------------------------------------Recall: references are automatically initialized to null public boolean isLeaf() and numbers to 0 when their object is created. { return (childArray[0]==null) ? true : false; } So, Node doesn’t need a Constructor. // ------------------------------------------------------------public int getNumItems() 23 { return numItems; } One of three slides of code for class Node public DataItem getItem(int index) // get DataItem at index { return itemArray[index]; } // ------------------------------------------------------------public boolean isFull() { return (numItems==ORDER-1) ? true : false; } // ------------------------------------------------------------public int findItem(int key) // return index of item (within node) { for(int j=0; j<ORDER-1; j++) // if found, otherwise return -1 { if(itemArray[j] == null) break; else Find routine: if(itemArray[j].dData == key) Looking for the data within return j; the node where we are located. }// end for return -1; } // end findItem // ------------------------------------------------------------ public DataItem removeItem() { // removes largest item Delete Routine // assumes node not empty DataItem temp = itemArray[numItems-1]; // use index, save item Saves the deleted item. itemArray[numItems-1] = null; // disconnect it Sets the location contents to null. numItems--; // one less item Decrements the number of items return temp; // return item at the node. } Returns the deleted data item. // ------------------------------------------------------------public void displayNode() { // format "/24/56/74/" for(int j=0; j<numItems; j++) itemArray[j].displayItem(); // "/56" System.out.println("/"); // final "/" 24 } Class Node (continued) Class Node (continued) Insert Routine Increments number of items in node. Get key of new item. Now loop. // ------------------------------------------------------------ public int insertItem(DataItem newItem) { // assumes node is not full numItems++; int newKey = newItem.dData; // will add new item // key (int value) of new item for(int j=ORDER-2; j>=0; j--) // start on right to examine data { // looking for spot to insert if(itemArray[j] == null) // if item null, go left one cell. continue; // Recall: what does ‘continue’ do? Go through code and my comments.else { // if not null, get its key Start looking for place to insert the int itsKey = itemArray[j].dData; // not necessary, but … data item. Start on the right and if(newKey < itsKey) // if existing key is bigger, proceed left looking for proper place. itemArray[j+1] = itemArray[j]; // shift whole node right else { // otherwise, insert new item and return index.+1 itemArray[j+1] = newItem; // copies over moved item… return j+1; } } // end else (not null) } // end for // shifted all items, itemArray[0] = newItem; // insert new item return 0; } // end insertItem() 25 Code for the 2-3-4 Tree itself 26 class Tree234App { This code is merely the interface for a client. Pretty easy to follow. All the complex processing is undertaken at the tree level and at the node level. Be certain to recognize this. Note the ‘break’ in the case statements… public static void main(String[] args) throws IOException { long value; Tree234 theTree = new Tree234(); theTree.insert(50); theTree.insert(40); theTree.insert(60); theTree.insert(30); theTree.insert(70); while(true) { System.out.print("Enter first letter of "); System.out.print("show, insert, or find: "); char choice = getChar(); switch(choice) { case 's': theTree.displayTree(); break; case 'i': System.out.print("Enter value to insert: "); value = getInt(); theTree.insert(value); break; case 'f': System.out.print("Enter value to find: "); value = getInt(); int found = theTree.find(value); if(found != -1) System.out.println("Found "+value); else System.out.println("Could not find "+value); break; default: System.out.print("Invalid entry\n"); } // end switch 27 } // end while Class Tree234App (continued) //-------------------------------------------------------------public static String getString() throws IOException { InputStreamReader isr = new InputStreamReader (System.in); BufferedReader br = new BufferedReader(isr); String s = br.readLine(); return s; } //-------------------------------------------------------------public static char getChar() throws IOException { String s = getString(); return s.charAt(0); } //------------------------------------------------------------public static int getInt() throws IOException { String s = getString(); return Integer.parseInt(s); } //------------------------------------------------------------28 } // end class Tree234App class Tree234 { The object of type Tree234 IS the entire tree.private Node root = new Node(); // make root node Note: tree only has one attribute, its root. public int find(int key) { This is all it needs. Node curNode = root; int childNumber; while(true) { if(( childNumber=curNode.findItem(key) ) != -1) return childNumber; // found; recall findItem returns index v else if( curNode.isLeaf() ) return -1; // can't find it else // search deeper Finding, Searching, and Splitting algorithms curNode = getNextChild(curNode, key); are all shown here. } // end while }// end find() public void insert(int dValue) { // insert a DataItem Node curNode = root; DataItem tempItem = new DataItem(dValue); while(true) Call split routine { if( curNode.isFull() ) { split(curNode); // call to split node See creation of additional two nodes curNode = curNode.getParent(); // back up // search onc curNode = getNextChild(curNode, dValue); } // end if(node is full) else if( curNode.isLeaf() ) // if node is leaf, insert data break; else // node is not full, not a leaf, so go to lower level curNode = getNextChild(curNode, dValue); 29 } // end while Class Tree234 (continued) Be careful in here. Remember, the middle value is moved to parent and must be disconnected (and made null). Rightmost element is moved into new node as leftmost data item. Review the process and then note the code that implements the process. public void split(Node thisNode) { // split the node // assumes node is fu DataItem itemB, itemC; // When you get here, you know you need to split… Node parent, child2, child3; int itemIndex; itemC = thisNode.removeItem(); // remove items from this node. itemB = thisNode.removeItem(); // Note: these are second and third items child2 = thisNode.disconnectChild(2); // remove children These are rightmost child3 = thisNode.disconnectChild(3); // from this node two children Node newRight = new Node(); // make new node if(thisNode==root) // if the node we’re looking at is the root, { root = new Node(); // make new root parent = root; // root is our parent root.connectChild(0, thisNode); // connect to parent } else // this node to be split is not the root parent = thisNode.getParent(); // get parent // deal with parent itemIndex = parent.insertItem(itemB); // item B to parent int n = parent.getNumItems(); // total items? for(int j=n-1; j>itemIndex; j--) { // move parent's connections Node temp = parent.disconnectChild(j); // one child to the right parent.connectChild(j+1, temp); } parent.connectChild(itemIndex+1, newRight); // connect newRight to parent // deal with newRight newRight.insertItem(itemC); // item C to newRight newRight.connectChild(0, child2); // connect to 0 and 1 30 newRight.connectChild(1, child3); // on newRight Class Tree234 (continued) // gets appropriate child of node during search for value public Node getNextChild(Node theNode, int theValue) { int j; // assumes node is not empty, not full, not a leaf int numItems = theNode.getNumItems(); for(j=0; j<numItems; j++) // for each item in node // are we less? if( theValue < theNode.getItem(j).dData ) return theNode.getChild(j); // return left child // end for // we're greater, so return theNode.getChild(j); // return right child }// end getNextChild() // ------------------------------------------------------------public void displayTree() { recDisplayTree(root, 0, 0); } // ------------------------------------------------------------private void recDisplayTree(Node thisNode, int level, int childNumber) { System.out.print("level="+level+" child="+childNumber+" "); thisNode.displayNode(); // display this node // call ourselves for each child of this node int numItems = thisNode.getNumItems(); for(int j=0; j<numItems+1; j++) { Node nextNode = thisNode.getChild(j); if(nextNode != null) recDisplayTree(nextNode, level+1, j); else return; } } // end recDisplayTree() // -------------------------------------------------------------\ 31 } // end class Tree234 Efficiency Considerations 32 Efficiency Considerations for 2-3-4 Trees • Searching: • 2-3-4 Trees: one node must be visited, but – More data per node / level. – Searches are fast. • recognize all data items at node must be checked in a 2-3-4 tree, • but this is very fast and is done sequentially. • All nodes in the 2-3-4 tree are NOT always full. • Overall, for 2-3-4 trees, the increased number of items (which increases processing / search times) per node processing tends to cancel out the increases gained from the decreased height of the tree and size of the nodes. – Increased number of data items per node implies: fewer node retrievals. • So, the search times for a 2-3-4 tree and for a balanced binary tree are approximately equal and both are O(log2n) 33 Efficiency Considerations for 2-3-4 Trees • Storage • 2-3-4 Trees: a node can have three data items and up to four references. • Can be an array of references or four specific variables. • IF not all of it is used, can be considerable waste. • In 2-3-4 trees, quite common to see many nodes not full. 34