XML retrieval pdf

**project girl** · 19-01-2013, 12:34 PM

XML retrieval

.pdf

XML retrieval.pdf (Size: 387.83 KB / Downloads: 101)

INTRODUCTION

Information retrieval systems are often contrasted with relational databases.
Traditionally, IR systems have retrieved information from unstructured text
– by which we mean “raw” text without markup. Databases are designed
for querying relational data: sets of records that have values for predefined
attributes such as employee number, title and salary. There are fundamental
differences between information retrieval and database systems in terms of
retrievalmodel, data structures and query language as shown in Table 10.1.1
Some highly structured text search problems are most efficiently handled
by a relational database, for example, if the employee table contains an attribute
for short textual job descriptions and you want to find all employees
who are involved with invoicing.

Basic XML concepts

An XML document is an ordered, labeled tree. Each node of the tree is an
XML element and is written with an opening and closing tag. An element can
have one or more XML attributes. In the XML document in Figure 10.1, the
scene element is enclosed by the two tags <scene ...> and </scene>. It
has an attribute number with value vii and two child elements, title and verse.
Figure 10.2 shows Figure 10.1 as a tree. The leaf nodes of the tree consist of
text, e.g., Shakespeare, Macbeth, and Macbeth’s castle. The tree’s internal nodes
encode either the structure of the document (title, act, and scene) or metadata
functions (author).

Challenges in XML retrieval

In this section, we discuss a number of challenges that make structured retrieval
more difficult than unstructured retrieval. Recall from page 195 the
basic setting we assume in structured retrieval: the collection consists of
structured documents and queries are either structured (as in Figure 10.3)
or unstructured (e.g., summer holidays).
The first challenge in structured retrieval is that users want us to return
parts of documents (i.e., XML elements), not entire documents as IR systems
usually do in unstructured retrieval. If we query Shakespeare’s plays for
Macbeth’s castle, should we return the scene, the act or the entire play in Figure
10.2? In this case, the user is probably looking for the scene. On the other
hand, an otherwise unspecified search for Macbeth should return the play of
this name, not a subunit.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Software Crisis pdf	study tips	1	2,117	21-09-2017, 04:31 PM Last Post: jaseela123
	HOW EMAIL WORKS pdf	project girl	1	3,067	20-09-2017, 11:39 AM Last Post: jaseela123
	Cyber crime detection, investigation and prosecution pdf	seminar projects maker	1	958	20-09-2017, 11:31 AM Last Post: jaseela123
	Review: Context Aware Tools for Smart Home Development pdf	study tips	1	1,227	20-09-2017, 11:22 AM Last Post: jaseela123
	Getting Started with the MAXQ1103 Evaluation Kit and the CrossWorks Compiler pdf	project girl	1	969	15-09-2017, 03:11 PM Last Post: jaseela123
	Wireless Application Protocol (WAP) pdf	project girl	1	1,531	15-09-2017, 02:42 PM Last Post: jaseela123
	MAC Protocol for Reliable Multicast over Multi-Hop Wireless Ad Hoc Networks pdf	study tips	1	1,029	15-09-2017, 12:39 PM Last Post: jaseela123
	Wireless Automotive Communications pdf	seminar projects maker	1	637	14-09-2017, 01:27 PM Last Post: jaseela123
	Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data pdf	study tips	1	2,018	13-09-2017, 12:59 PM Last Post: jaseela123
	Internetworking connectionless and connection-oriented networks pdf	project girl	1	1,151	13-09-2017, 11:03 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.