03-08-2012, 04:08 PM
Query Management For The Semantic Web
1Query Management.pdf (Size: 514.63 KB / Downloads: 29)
Abstract
This master’s thesis has focused on the development of a query management system for the
Semantic Web. The latter is a project initiated by the World Wide Web Consortium, aimed
at providing better search capabilities to the World Wide Web through an improved metadata
framework based on the Resource Description Framework (RDF) standard.
The query management system is mainly designed to interface with the Edutella peer-topeer
network, which is an international initiative to create an RDF-based network that allows
highly advanced searches.
Introduction
Administration
This work is part of the distributed interactive learning environment that is being developed
by the Knowledge Management Research (KMR) group, Centre for User Oriented IT Design
(CID), at the Royal Institute of Technology (KTH), Stockholm.
Supervisors have been Matthias Palm´er & Ambj¨orn Naeve, KTH, and examiner Eva P¨art-
Enander, Department of Scientific Computing, Uppsala University.
The Problem
The Semantic Web is a project initiated by the World Wide Web Consortium that aims to
provide a better way of describing resources on the World Wide Web using the Resource
Description Framework (RDF) standard (see Section 3, Background, for details). RDFbased
metadata has significant advantages, such as enabling accurate and arbitrary complex
searches.
Edutella is an international initiative that builds on the RDF standard to implement a
Semantic Web in the form of a peer-to-peer network (see Section 3.4.1, Edutella, for details).
Searches on the Edutella network is done by RDF-based queries that are similar to
Datalog/Prolog programs.
How To Read This Paper
The following is an outline of the sections of this thesis:
Section 3, Background, is a description of the World Wide Web, its limitations, and the
improvements to it that are introduced by the Semantic Web initiative. This section tries to
illuminate the need of some kind of query management for the advanced query possibilities
that follows from a structured metadata system.
Section 4, Analysis, examines the query process, discusses problems with it and how it
relates to the process of metadata editing. This section tries to motivate the design decisions
made for the query management system described in Section 5.
Section 5, Design, describes the design of the query management system.
Section 6, Implementation, provides brief comments about the development environment
in which the implementation of the query management system was made.
Section 7, Query Management For The Conzilla Browser, describes the query management
system applied to the concept browser Conzilla.
Section 8, Future Perspectives, is a discussion of the possible extensions to the query
management system and related issues.
Background
The World Wide Web
The Internet and the World Wide Web (WWW or just “the web”) today consists to a large
extent of a distributed collection of HyperText Markup Language (HTML) pages. TheWWW
was developed in the early 1990s at the European Organization for Nuclear Research (CERN)
in Geneva, Switzerland, as an effort to ease the sharing of information among their many
different computer systems [10]. The basis for the WWW was HTML, which was meant
to be a simple format language that structured the information in a document into logical
components such as headings, paragraphs, and links [12].
The WWW was a huge success and as it gained a more widespread use the focus shifted
from the early content-based view to concerns about the appearance of web pages. HTML
version 2 [13] and 3 [14] contained a lot of new layout features which, together with script
languages and Java applets, increased the interactive capabilities as well as the visual expressiveness
of the web.
Searching The Web
With the growth of the web the amount of available information has become too large to
browse manually. To find information about a particular topic various search engines have
to be used. A major problem for these search engines is that in order to find relevant results
they have to scan HTML documents to find words or phrases that match keywords used to
describe the topic.
The first problem with this approach is that it is difficult to describe a topic with keywords
that match the content of all relevant pages to a particular topic without also matching a
lot of unrelated pages. A keyword describing a topic might occur in pages unrelated to
the topic, and reversely, relevant pages don’t necessarily contain the chosen keyword. The
task of matching keywords with page content without generating “false positives” is further
complicated by the fact that current web pages intermingle content with layout information.
Another even bigger problem with this approach is that the content of some topics isn’t
text at all. When searching for any type of media files, programs, or other non-textual content,
a search engine is totally dependent on the author of the content pages to label the content
appropriately.