A Self-adjusting Data Structure for Multi-dimensional Point Sets Eunhui Park & David M. Mount University of Maryland Sep. 2012 Motivation • Sleator & Tarjan introduced the splay tree almost 30 years ago. • Self adjusts to access distribution • Supports insertion and deletion in O(log n) amortized time • Efficient access: • Balance property – m accesses in O((m+n) log n) time • Scanning property [Elmasry 2004] – access all items in O(n) time • Working set property – … on temporal locality • Static optimality property – Efficient access based on frequency • Static & dynamic finger [Cole, 2000] properties – … on spatial locality Is there a multi-dimensional generalization? Background • Compressed Quadtree • Hierarchical partition of space • O(n) space • O(log n) access time if augmented: • Topology tree [Frederickson1985, Har-Peled 2005 ] • Skip quadtree [Eppstein, Goodrich, Sun 2005] • Quadtreap [Mount, Park 2010] based on treap [Seidel, Aragon 1996] • Efficient approximate proximity queries • Approximate nearest neighbor search • Approximate range search Objective Quadtree + Splay tree Splay Quadtree • Like quadtrees: • A versatile geometric partition tree • Supports efficient approximate proximity queries • Like splay trees: • Adjusts to access distribution • Supports insertion/deletion in O(log n) amortized time • Supports splay tree access properties: balance, static optimality, working set, static finger Overview • BD-tree • BD-tree • Rotation • Splaying operation • Basic splaying • Splaying • Efficiency • Insertion/deletion • Search and access efficiency BD-tree Box Decomposition tree (BD-tree) : A geometric data structure based on a hierarchical decomposition of space into d-dimensional axis-aligned rectangles • Each node is associated with a region of space • • • • called a cell. Each cell is defined by an outer box and an optional inner box. Partition operations: split and shrink. Internal nodes: split nodes and shrink nodes. Each leaf has a single point or a single inner box. box cell leaves BD-tree: Partitioning Operations • Split Partitions a cell by an axis-orthogonal hyperplane that bisects the cell’s longest side. C D C E left D E split • Shrink Partitions a cell by a shrinking box, which lies within the cell. right C C C F shrink inner F outer C\F 523686 BD-tree: Promotion • By construction, nodes are generated in shrink-split pairs. We merge each into a single ternary node, called a pseudo-node. shrink node outer inner split node right left pseudo-node left right outer • Tree can be restructured through a local operation, called promotion. x y E () y x D A B C E () A C B C D D E A B Splay Quadtree • Given an internal node, x, splay(x) uses promotions to transform x to the root of the tree • This makes future accesses to x more efficient g x splay(x) f b g e d c b x c f d e Basic Splaying • As in Sleator & Tarjan, splaying is based on primitive operations: • Zig-zag z z y x F A x G y x B C F D D A E B G z y D E A C B E C F G • Zig-zig x z y y y F x D A B C G A z x B z D E C A B C E F G D E F G The Problem of Right Promotion • Inner-left convention: • If an internal node’s cell has an inner box, it resides in its left child • If necessary, left and right children are relabeled to satisfy this • This guarantees that each cell has constant complexity • Right promotion may violate this convention y x E () A x E u B If this cell has an inner box, u y B C A D v C D D v v Now, y’s cell has two inner boxes, u and v ! A u E u B C Splaying in 3-Phases • Promotions must be carefully structured to avoid this problem • 3-phased approach (3 passes from bottom to top) g R g f O e O L d c R L a g b R R f L O a b b g c c c d R e L b a d d f f e a • As in Sleator & Tarjan, amortized efficiency is established by a potential-based analysis. e Insertion and deletion • Insert(q): locate leaf x containing q add q as new leaf splay(x) x (, ) () x q x q • Insertion can be performed in O(log n) amortized time. • Deletion can be performed in O(log n) amortized time. Analogous to Splay Trees • Balance Theorem: Total access for q1, q2, …, qm takes O((m+n) log n) time. • Working Set Theorem: For each access qj, let tj be the number of different queries since the last access of qj, or since the beginning if this is the qj’s first access. Total m access queries take O( =1 log + 1 + + log ). • Static Optimality Theorem: Given a quadtree subdivision Z, where each cell z ∈ Z has an access probability pz, the entropy of Z is defined as = ∈ 1 Total m access queries take O( ∙ ()). Static Finger Theorem • 1-dim (Sleator & Tarjan 83) Total access for i1, i2, …, im takes O(m + =0 log( − + 1)). • d-dim • For a single point , - Let = (, ) × ° Static Finger Theorem • 1-dim (Sleator & Tarjan 83) Total access for i1, i2, …, im takes O(m + =0 log( − + 1)). • d-dim • But most geometric queries involve regions, not points - Let = max (, ) ∈ × Static Finger Theorem • 1-dim (Sleator & Tarjan 83) Total access for i1, i2, …, im takes O(m + =0 log( − + 1)). • d-dim • queries 1 , 2 , ⋯ , - Let = max max (, ) 1≤≤ ∈ × Static Finger Theorem • 1-dim (Sleator & Tarjan 83) Total access for i1, i2, …, im takes O(m + =0 log( − + 1)). • d-dim • For the technical reasons, need to expand - Let = (1 + ) max max (, ) 1≤≤ ∈ × Static Finger Theorem • 1-dim (Sleator & Tarjan 83) Total access for i1, i2, …, im takes O(m + =0 log( − + 1)). • d-dim • Consider an expanded ball - Let = (1 + ) max max (, ) 1≤≤ ∈ • Define the working set to be the set of points within distance from • Total access for approx. range queries 1 , 2 , ⋯ , : O( (1/ε) d-1 log || + log ) • ANN queries • Box queries × : set of points in expanded ball Conclusions • Splay Quadtree: • Self-adjusting geometric data structure • Supports insertion/deletion in O(log n) amortized time • Supports efficient approximate proximity queries • Open problems: • Other properties of standard splay trees? • Dynamic finger theorem • Scanning theorem • Better notions of distance (or generally locality) in a geometric setting? References Thank you!