21-04-2012, 04:52 PM
please reply me for system architecture diagram for extended XML tree pattern matching :theories and algorithms a 2011 project on java
21-04-2012, 04:52 PM
please reply me for system architecture diagram for extended XML tree pattern matching :theories and algorithms a 2011 project on java
28-07-2012, 02:26 PM
system architecture of extended xml tree pattern
29-06-2013, 04:52 PM
Extended XML Tree Pattern Matching: Theories and Algorithms Extended XML.pdf (Size: 1.16 MB / Downloads: 17) Abstract As business and enterprises generate and exchange XML data more often, there is an increasing need for efficient processing of queries on XML data. Searching for the occurrences of a tree pattern query in an XML database is a core operation in XML query processing. Prior works demonstrate that holistic twig pattern matching algorithm is an efficient technique to answer an XML tree pattern with parent-child (P-C) and ancestor-descendant (A-D) relationships, as it can effectively control the size of intermediate results during query processing. However, XML query languages (e.g. XPath, XQuery) define more axes and functions such as negation function, order-based axis and wildcards. In this article, we research a large set of XML tree pattern, called extended XML tree pattern, which may include P-C, A-D relationships, negation functions, wildcards and order restriction. We establish a theoretical framework about “matching cross” which demonstrates the intrinsic reason in the proof of optimality on holistic algorithms. Based on our theorems, we propose a set of novel algorithms to efficiently process three categories of extended XML tree patterns. A set of experimental results on both real-life and synthetic data sets demonstrate the effectiveness and efficiency of our proposed theories and algorithms. INTRODUCTION As business and enterprisers generate and exchange XML data more often, there is an increasing need for efficient processing of queries on XML data. An XML query pattern commonly can be represented as a rooted, labeled tree (or called twig). For example, Figure 1(a) shows an example XPath query: A=C and the corresponding XML tree pattern. This query finds all node C that has the parent A which has another child B. In Figure 1(b) [b]Outline The rest of the article is organized as follows. Section 2 gives the preliminaries about research problem and the processing model. Section 3 shows a set of theories about matching cross and Section 4 presents an extended XML tree pattern matching algorithm called TreeMatch. Section 5 presents thorough experimental studies between the novel algorithms and the prior methods. Finally, Section 6 presents previous work related to the XML tree pattern matching and Section 7 concludes this article. Labeling Schemes Most XML query processing algorithms on XML documents rely on certain labeling schemes, such as region encoding scheme [27], prefix scheme [13], ORDPATH [19], extended Dewey scheme [16]. In this article, we use the extended Dewey labeling scheme, proposed in paper [16], to assign each node in XML documents a sequence of integers to capture the structure information of documents. Basic properties of algorithms The algorithms for XML tree pattern matching proposed in this article have two basic yet important properties, as follows. Single-direction scan We adopt a structure, named label list, associated with each query node. The label list is a posting list (or inverted list) containing the extended Dewey labels of XML elements which have the same name, and all elements are ordered according to the document order. We use TA to denote the label list for query node A. There is a cursor for each list. It moves in the single direction to scan all elements once in increasing order. Each label in a list can be read only once. THEORETICAL ANALYSIS In this section, we establish a theoretical framework about “matching cross” which demonstrates the intrinsic reason for the sub-optimality of existing holistic algorithms. The purposes of our study are (i) to provide insight into the characteristics of the holistic algorithms, and thus promotes our understanding about their behaviors; and (ii) to lead to novel algorithms that can guarantee a larger optimal query class and realize better query performance. Matching Cross The existing holistic algorithms ([11], [16]) consist of two phases: (i) in the first phase, a list of path solutions is output as intermediate path solutions and each solution matches the individual root-to-leaf path expression; and (ii) in the second phase, the path solutions are merged to produce the final answers for the whole twig query. However, for queries with parent-child (P-C) relationships, the state-of-the-art algorithms cannot guarantee that each intermediate solution output in the first phase is useful to merge in the second phase. In other words, many useless intermediate results (i.e. path solutions) may be produced in the first phase, which is called the suboptimality of algorithms, as further illustrated in the following example. RELATED WORK In the context of semi-structured and XML databases, treebased query pattern is a very practical and important class of queries. Lore DBMS [9] and Timber [10] systems have considered various aspects of query processing on such data and queries. XML data and various issues in their storage as well as query processing using relational database systems have recently been considered in [18], [27], [22], [19]. Our holistic algorithm TreeMatch for extended tree patterns can leverage these previous techniques. From the aspect of theoretical research about the optimality of XML tree pattern matching, Choi et al. [8] developed theorems to prove that it is impossible to devise a holistic algorithm to guarantee the optimality for queries with any combination of P-C and A-D relationships. Shalem et al. [21] researched the space complexity of processing XML twig queries. Their paper showed that the upper bound of full-fledge queries with parent-child and ancestor-descendant edges are O(D), where D is the document size. In other words, their results also theoretically prove that there exists no algorithm to optimally process an arbitrary query Q=;==;¤. Our research in this article moves the frontier forward by identifying a large subclass of Q=;==;¤, which can be guaranteed to process optimally. CONCLUSIONS We have introduced a notion of matching cross to address the problem of the sub-optimality in holistic XML tree patten matching algorithms. We have identified a large optimal query classes for three kinds of queries, that is Q=;==;¤, Q=;==;¤;< and Q=;==;¤;<;:, respectively. Based on these results, we have proposed a new holistic algorithm called TreeMatch to achieve such theoretical optimal query classes. Finally, extensive experiments demonstrate the advantage of our algorithms and verify the correctness of theoretical results. |
|