30-05-2012, 05:46 PM
R-Trees
R-Trees-1.ppt (Size: 975.5 KB / Downloads: 125)
In the beginning…
The B-Tree provided a foundation for R-Trees. But what’s a B-Tree?
A data structure for storing sorted data with amortized run times for insertion and deletion
Often used for data stored on long latency I/O (filesystems and DBs) because child nodes can be accessed together (since they are in order)
What’s wrong with B-Trees
B-Trees cannot store new types of data
Specifically people wanted to store geometrical data and multi-dimensional data
The R-Tree provided a way to do that (thanx to Guttman ‘84)
R-Trees
R-Trees can organize any-dimensional data by representing the data by a minimum bounding box.
Each node bounds it’s children. A node can have many objects in it
The leaves point to the actual objects (stored on disk probably)
The height is always log n (it is height balanced)
Operations
Searching: look at all nodes that intersect, then recurse into those nodes. Many paths may lead nowhere
Insertion: Locate place to insert node through searching and insert.
If a node is full, then a split needs to be done
Deletion: node becomes underfull. Reinsert other nodes to maintain balance.
Problems in original R-Tree
Because the only criteria is to minimize area
Certain types of data may create small areas but large distances which will initiate a bad split.
If one group reaches a maximum number of entries, the rest of assigned without consideration of their geometry.
Greene tried to solve, but he only used the “split axis” – more criteria needs to be used