09-05-2012, 12:24 PM
give me the video,algorithms,theories,
matching,pattern,tree,extended xml tree pattern matching theories and algorithms video
09-05-2012, 12:24 PM
give me the video,algorithms,theories, matching,pattern,tree,extended xml tree pattern matching theories and algorithms video
11-02-2013, 02:52 PM
Extended XML Tree Pattern Matching: Theories and Algorithms 1Extended XML.pdf (Size: 2.41 MB / Downloads: 73) Abstract As business and enterprises generate and exchange XML data more often, there is an increasing need for efficient processing of queries on XML data. Searching for the occurrences of a tree pattern query in an XML database is a core operation in XML query processing. Prior works demonstrate that holistic twig pattern matching algorithm is an efficient technique to answer an XML tree pattern with parent-child (P-C) and ancestor-descendant (A-D) relationships, as it can effectively control the size of intermediate results during query processing. However, XML query languages (e.g., XPath and XQuery) define more axes and functions such as negation function, order-based axis, and wildcards. In this paper, we research a large set of XML tree pattern, called extended XML tree pattern, which may include P-C, A-D relationships, negation functions, wildcards, and order restriction. We establish a theoretical framework about “matching cross” which demonstrates the intrinsic reason in the proof of optimality on holistic algorithms. Based on our theorems, we propose a set of novel algorithms to efficiently process three categories of extended XML tree patterns. A set of experimental results on both real-life and synthetic data sets demonstrate the effectiveness and efficiency of our proposed theories and algorithms. INTRODUCTION AS business and enterprises generate and exchange XML data more often, there is an increasing need for efficient processing of queries on XML data. An XML query pattern commonly can be represented as a rooted, labeled tree (or called twig). For example, Fig. 1a shows an example XPath query: A=C and the corresponding XML tree pattern. This query finds all node C that has the parent A which has another child B. In Fig. 1b, the query answers are nodes “C1” and “C2.” Efficient matching of XML tree patterns has been widely considered as a core operation in XML query processing. In recent years, many methods [9], [13], [3], [11], [4], [25] have been proposed to match XML tree queries efficiently. In particular, Khalifa et al. [1] proposed a stack-based algorithm to match binary structural relationship including parentchild (P-C) and ancestor-descendant (A-D) relationships. The limitation of their method is that the size of useless intermediate results may become very large, even if the final results are small. Bruno et al. [3] proposed a novel holistic twig join algorithm named TwigStack, which processes the tree pattern holistically without decomposing it into several small binary relationships. PRELIMINARIES Modeling of XML Data and Extended Tree Pattern Query An XML database D is usually modeled as a rooted, nodelabeled tree (in this paper, we use D to represent the database and the related tree model exchangeably without specific declaration), element tags and attributes are mapped to nodes in the trees and the edges are used to represent the direct nesting relationships. Our primary focus is on element nodes; and it is not difficult to extend our methods to process the other types of nodes, including attribute and character data. For convenience, we distinguish between query nodes and database nodes by using the term “node” to refer to a query node and the term “element” to refer to a data element in D. Labeling Schemes Most XML query processing algorithms on XML documents rely on certain labeling schemes, such as region encoding scheme [27], prefix scheme [13], ORDPATH [19], and extended Dewey scheme [16]. In this paper, we use the extended Dewey labeling scheme, proposed in paper [16], to assign each node in XML documents a sequence of integers to capture the structure information of documents. Extended Dewey labeling scheme is a variant scheme of the prefix labeling scheme. In the prefix labeling scheme, the root is labeled by an empty string and for a nonroot element u, labelðuÞ ¼ labelðvÞ:n, where u is the nth child of v. In Extended Dewey labeling scheme, each label provides complete information about ancestors’ names and labels. THEORETICAL ANALYSIS In this section, we establish a theoretical framework about “matching cross” which demonstrates the intrinsic reason for the suboptimality of existing holistic algorithms. The purposes of our study are 1) to provide insight into the characteristics of the holistic algorithms, and thus promotes our understanding about their behaviors; and 2) to lead to novel algorithms that can guarantee a larger optimal query class and realize better query performance. Matching Cross The existing holistic algorithms [11], [16] consist of two phases: 1) in the first phase, a list of path solutions is output as intermediate path solutions and each solution matches the individual root-to-leaf path expression; and 2) in the second phase, the path solutions are merged to produce the final answers for the whole twig query. However, for queries with parent-child (P-C) relationships, the state-ofthe- art algorithms cannot guarantee that each intermediate solution output in the first phase is useful to merge . EXPERIMENTS In this section, we present an extensive experimental study of TreeMatch on real-life and synthetic data sets. Our results verify the effectiveness, in terms of accuracy and optimality, of the TreeMatch as holistic twig join algorithms for large XML data sets. These benefits become apparent in a comparison to previously four proposed algorithms TwigStack [3], TJFast [16], OrderedTJ [17], and TwigStack- ListNot [26]. The reason that we choose these algorithms for comparison is that 1) similar to TreeMatch, both TJFast and TwigStack are two holistic twig pattern matching algorithms. But they cannot process queries with order restriction or negative edges; and 2) OrderedTJ is a holistic twig algorithm which can handle order-based XML tree pattern, but is not appropriate for queries with negative edges; and finally 3) TwigStacklistNot is proposed for queries with negative edges, but it cannot work for ordered queries. Only TreeMatch algorithm can process queries with order restriction, negative edge, and wildcards. CONCLUSIONS We have introduced a notion of matching cross to address the problem of the suboptimality in holistic XML tree patten matching algorithms. We have identified a large optimal query classes for three kinds of queries, that is Q respectively. Based on these results, we have proposed a new holistic algorithm called TreeMatch to achieve such theoretical optimal query classes. Finally, extensive experiments demonstrate the advantage of our algorithms and verify the correctness of theoretical results. |
|