14-01-2013, 04:22 PM
An Intelligent Web Portal System for Web Information Region Integration
An Intelligent Web.pdf (Size: 625.05 KB / Downloads: 30)
Abstract
The paper proposes an intelligent web portal
system (IWPS) as a way to integrate web page information.
Web portals serve as an essential role to the development
of the Internet, and offer useful resources and services to
large numbers of web users. Although most web portals
offer complete web site information, they simply collect
and classify web sites and do not provide a further analysis
of the relationship of web content information. This study
aims to develop an intelligent web portal system that unlike
traditional web portals uses a number of intelligent
methods, including information extraction (IE),
information retrieval (IR), and heuristic algorithm (HA),
and provides more meaningful and integrated information
to web users.
Introduction
The World Wide Web (WWW) has turned into an
important part of human life. As the quantity of information
on the Web grows exponentially, web portal services
become indispensable to WWW users. In addition, web
portals focus on offering more diverse services and
resources to attract more web users. Imagine that a massive
quantity of web content information is stored in many
different web sites, and a web portal serves as a map that
guides users to find the products (knowledge) they need.
Without such a map, regardless of how abundant
information the virtual world boasts, users cannot transform
that information to build knowledge they need in a fast and
convenient way. It is thus evident that web portals provide
essential directions to users to browse the Internet.
Related Work
While “portal” is defined in many different ways, the
definitions are similar to some extent. Generally speaking
[1], a web portal serves as a starting point when users
connect to the Internet, just like a doorway to web services
that guide users to the right information they need. Cohan
[2] refers a web portal to the place where users start
browsing the Internet and it provides search engine, email,
information, chat room and other services. Pickering [3]
believes that the major function of most web portals is to
provide a single access point to information. Since the
various definitions and classification of web portals are
proposed by scholars from different perspectives, they are
sometimes confusing [4], [5]. Strauss [6] categorizes web
portals into “Horizontal Enterprise Portals” (HEPs) and
“Vertical Enterprise Portals” (VEPs). The classification of
horizontal and vertical portals [1], [4], [7], [8] is the most
commonly used and understood method of classification.
They are defined respectively as follows: A. Horizontal
Portal: An Internet portal system that is open to the public
and is often considered as a commercial site. Most
horizontal portals offer on a single web page a broad array
of resources and services that any user may need. Yahoo![9]
and MSN[10]. B. Vertical Portal: Also called “Vortals” [1],
a web site that provides information and services of interest
to consumers within a particular industry. Vortals offer
personalized, custom-made services as defined and
requested by users. NBCi [11] is an example of vertical
portals.
Problem description
The goal for developing this system is unlike the
research of ordinary vertical portal, and therefore we will
discuss the main objects of this investigation. At present,
most vertical portal researches extract web information
which is then integrated, but they do not analyze the
relationship of that content to specific websites. From the
viewpoint of transforming this information into knowledge,
if a portal system can determine the relationship of a
specific web to its contents, and combine that, then it can
provide the knowledge of the web page to the user, rather
than the information of that web page. The difference
between traditional vertical portals [29], [30] and our portal
system is shown in Figure 1. Traditional vertical portals
collect the contents from specific web sites (or web pages)
to another web site, which then provides the integrated
information to the user. This can reduce browsing time for
the www user who can go to a web site and obtain large
amounts of information. This work is concerned with the
development of a more effective web portal system. Figure
1 illustrates how the intelligent portal extracts data regions,
from web pages, and then analyzes the relations of every
field of each of the data regions with every field of every
other region. Finally, the system rearranges the fields of
each data region to build a more useful portal page.
Application
In this section, we take two simple examples to show
the application of IWPS and its final result. To take a
simple example, we use web pages constructed by three
different organizations, including three departments from
two schools. The three web pages contain teachers’
background information. They not only contain WCs
(teachers’ data) that need to be systematically integrated,
but also store many unnecessary WCs. Therefore, IWPS
needs to first identify teachers’ data from the three web
pages, and then analyze the structure of data storage, and
finally integrate the analyzed data. As the result, IWPS
system can extract the content complete and integrate the
information to a single access point in the Internet. This
access point provides all faculty information for users as
like a web portal.
Conclusions
This study proposes the framework and the rudiments
of an intelligent web portal system (IWPS) and establishes
a fundamental study for information integration of multiple
source web pages. Unlike traditional web portal systems,
IWPS proposed by this study adopts a number of intelligent
methods, including information extraction, information
retrieval, and heuristic algorithm. Actual testing concluded
that IWPS allows a richer and more flexible integration of
information than traditional web portals. IWPS is not
merely a vertical portal that aggregates disparate
information into a single web page; it extends information
content through the system’s advance analysis and
integration. In other words, users can obtain information
content that has been analyzed and integrated, instead of
original data from a web page. In a broader sense, with
various data storage methods and data formats currently
found on the Web, IWPS is able to integrate them into a
single storage format, which will be of great help to the
integration of web pages.