Answering Structured Queries on Unstructured Data

**seminar flower** · 01-08-2012, 03:46 PM

Answering Structured Queries on Unstructured Data

.pdf

10Answering Structured.pdf (Size: 332.86 KB / Downloads: 38)

Abstract

There is growing number of applications that require access to both structured and unstruc-
tured data. Such collections of data have been referred to as dataspaces, and Dataspace Support
Platforms (DSSPs) were proposed to offer several services over dataspaces, including search and
query, source discovery and categorization, indexing and some forms of recovery. One of the key
services of a DSSP is to provide seamless querying on the structured and unstructured data.
Querying each kind of data in isolation has been the main subject of study for the fields of
databases and information retrieval. Recently the database community has studied the problem
of answering keyword queries on structured data such as relational data or XML data. The only
combination that has not been fully explored is answering structured queries on unstructured
data.

Introduction

Significant interest has arisen recently in combining techniques from data management and infor-
mation retrieval [1, 5]. This is due to the growing number of applications that require access to
both structured and unstructured data. Examples of such applications include data management
in enterprises and government agencies, management of personal information on the desktop, and
management of digital libraries and scientific data. Such collections of data have been referred to
as dataspaces [8], and Dataspace Support Platforms (DSSPs) were proposed to offer several services
over dataspaces, including search and query, source discovery and categorization, indexing and
some forms of recovery.
One of the key services of a DSSP is to provide seamless querying on the structured and
unstructured data. Querying each kind of data in isolation has been the main subject of study for
the fields of databases and information retrieval. Recently the database community has studied
the problem of answering keyword queries on structured data such as relational data or XML
data [10, 2, 4, 21, 11].

Motivation

Broadly, our techniques apply in any context in which a user is querying a structured data source,
whereas there are also unstructured sources that may be related. The user may want the structured
query to be expanded to include the unstructured sources that have relevant information.
Our work is done in the context of the Semex Personal Information Management (PIM) Sys-
tem [6]. The goal of Semex is to offer easy access to all information on one’s desktop, with possible
extension to mobile devices, imported databases, and the Web. The various types of data on one’s
desktop, such as emails and contacts, Latex and Bibtex files, PDF files, Word documents and
Powerpoint presentations, and cached webpages, form the major data sources managed by Semex.
On one hand, Semex extracts instances and associations from these files by analyzing the data
formats, and creates a database. For example, from Latex and Bibtex files, it extracts Paper,
Person, Conference, Journal instances and authoredBy, publishedIn associations. On the other hand,
these files contain rich text and Semex considers them also as unstructured data.

Our Contributions

In this paper, we study how to extract keywords from a structured query, such that searching the
keywords on an unstructured data repository obtains the most relevant answers. The goal is to
obtain reasonably precise answers even without domain knowledge, and improve the precision if
knowledge of the schema and the structured data is available.
As depicted in Figure 1, the key element in our solution is to construct a query graph that
captures the essence of the structured query, such as the object instances mentioned in the query,
the attributes of these instances, and the associations between these instances. With this query
graph, we can ignore syntactic aspects of the query, and distinguish the query elements that convey
different concepts. The keyword set is selected from the node and edge labels of the graph.

Related Work

The Database community has recently considered how to answer keyword queries on RDB data [10,
2, 4] and on XML data [21, 11]. In this paper, we consider the reverse direction, answering
structured queries on unstructured data.
There are two bodies of research related to our work: the information-extraction approach and
the query-transformation approach. Most information-extraction work [9, 16, 17, 18, 13, 7, 3] uses
supervised learning, which is hard to scale to data in a large number of domains and apply to the
case where the query schema is unknown beforehand.
To the best of our knowledge, there is only one work, SCORE [15], considering transforming
structured queries into keyword search. SCORE extracts keywords from query results on structured
data and uses them to submit keyword queries that retrieve supplementary information. Our
approach extracts keywords from the query itself. It is generic in that we aim to provide reasonable
results even without the presence of structured data and domain knowledge; however, the technique
used in SCORE can serve as a supplement to our approach.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Ranked, Efficient and Secure Keyword search over encrypted cloud data PPT	seminar post	1	814	21-09-2017, 11:55 AM Last Post: jaseela123
	Data Mining: What is Data Mining? Report	project girl	1	2,262	21-09-2017, 11:47 AM Last Post: jaseela123
	DEMONSTRATING DATAPOSSESSION AND UN CHEATABLE DATA TRANSFER	seminar flower	1	1,466	19-09-2017, 11:05 AM Last Post: jaseela123
	Processing of collected data PPT	seminar projects maker	1	718	15-09-2017, 12:48 PM Last Post: jaseela123
	Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data pdf	study tips	1	2,018	13-09-2017, 12:59 PM Last Post: jaseela123
	Data Warehouse Report	study tips	1	879	12-09-2017, 12:23 PM Last Post: jaseela123
	structured query language report	project girl	1	974	09-09-2017, 09:12 AM Last Post: jaseela123
	CONFIDENTIAL DATA STORAGE AND DELETION details	seminar ideas	1	1,668	06-09-2017, 01:23 PM Last Post: jaseela123
	A Privacy-Preserving Remote Data Integrity Checking Protocol	seminar ideas	1	2,350	06-09-2017, 12:31 PM Last Post: jaseela123
	Enhancing and Analyzing Search performance in Unstructured Peer to Peer Networks	seminar flower	1	1,529	02-09-2017, 11:06 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.