13-06-2013, 04:41 PM
Prefix Tree and String Matching
Prefix Tree.pptx (Size: 109.89 KB / Downloads: 22)
Applications of String Matching
To find the longest IP prefix matching the packet destination address.
Packet filtering or packet classification mechanisms in layer 4 of TCP/IP protocols.
Dynamic dictionary matching and spell checking to molecular biology and DNA matching.
Improve the parsing process of source codes.
Basic Issues
Lack of well known methods to compare and sort strings of different lengths.
Sorting acts like a function which gives the relative position of each data element in the sorted space.
.We try to find a sort function for strings of different lengths which takes any string and finds its position with respect to others
The Binary Prefix Tree
Modified Version of Binary Tree.
Recursively dividing the data space into half.
Each subtree is a data space represented by the element in the root of the subtree.
The data strings are sorted and the binary tree is built by recursively splitting the data space in two at each step.
When the split point
is an enclosure, all included data strings are distributed in the right and left subtree
Sorting Strings
Each enclosure has a bag in which all elements contained in its space are placed first.
The remaining disjoint elements are sorted.
The above process is called Enclosurizing.
Smaller strings are put in the left, the larger strings in the right side of partition point and the matching elements are placed on the enclosure’s bag.
If the split point is not an enclosure, the regular quick sort method is applied.
Space division
Space division is encountered when a data string is replaced by its enclosure in an internal node.
According to the tree definition, all strings in the left subtree are smaller than the root and the elements in the right subtree are bigger.
When the root is replaced by another data element, it is impossible to guarantee the resulted tree satisfies this condition.
So, we need to divide the data space with respect to the new string in the root.
We need to divide the subspace either in the left or right.