Web Mining: Information and Pattern Discovery on the World Wide Web

**seminar flower** · 28-06-2012, 05:29 PM

Web Mining: Information and Pattern Discovery on the World Wide Web

.pdf

Web Mining.pdf (Size: 362.25 KB / Downloads: 64)
Abstract

Application of data mining techniques to the World
Wide Web, referred to as Web mining, has been the
focus of several recent research projects and papers.
However, there is no established vocabulary, leading to
confusion when comparing research eorts. The term
Web mining has been used in two distinct ways. The
rst, called Web content mining in this paper, is the
process of information discovery from sources across
the World Wide Web. The second, called Web usage
mining, is the process of mining for user browsing and
access patterns. In this paper we dene Web mining
and present an overview of the various research is-
sues, techniques, and development eorts.

Introduction

With the explosive growth of information sources
available on the World Wide Web, it has become
increasingly necessary for users to utilize automated
tools in nd the desired information resources, and to
track and analyze their usage patterns. These factors
give rise to the necessity of creating server-side and
client-side intelligent systems that can eectively mine
for knowledge. Web mining can be broadly dened as
the discovery and analysis of useful information from
the World Wide Web. This describes the automatic
search of information resources available on-line, i.e.
Web content mining, and the discovery of user access
patterns from Web servers, i.e., Web usage mining.

A Taxonomy of Web Mining

In this section we present a taxonomy ofWeb min-
ing, i.e. Web content mining and Web usage mining.
We also describe and categorize some of the recent
work and the related tools or techniques in each area.
This taxonomy is depicted in Figure 1.

Web Content Mining

The lack of structure that permeates the informa-
tion sources on the World Wide Web makes auto-
mated discovery of Web-based information dicult.
Traditional search engines such as Lycos, Alta Vista,
WebCrawler, ALIWEB [29], MetaCrawler, and others
provide some comfort to users, but do not generally
provide structural information nor categorize, lter,
or interpret documents. A recent study provides a
comprehensive and statistically thorough comparative
evaluation of the most popular search engines [32].

Web Usage Mining

Web usagemining is the automatic discovery of user
access patterns from Web servers. Organizations col-
lect large volumes of data in their daily operations,
generated automatically by Web servers and collected
in server access logs. Other sources of user information
include referrer logs which contain information about
the referring pages for each page reference, and user
registration or survey data gathered via CGI scripts.

Pattern Discovery from Web Transactions

As discussed in section 2.2, analysis of how users
are accessing a site is critical for determining eec-
tive marketing strategies and optimizing the logical
structure of the Web site. Because of many unique
characteristics of the client-server model in the World
Wide Web, including dierences between the physical
topology of Web repositories and user access paths,
and the diculty in identication of unique users as
well as user sessions or transactions, it is necessary to
develop a new framework to enable the mining pro-
cess. Specically, there are a number of issues in pre-
processing data for mining that must be addressed be-
fore the mining algorithms can be run. These include
developing a model of access log data, developing tech-
niques to clean/lter the raw data to eliminate outliers
and/or irrelevant items, grouping individual page ac-
cesses into semantic units (i.e. transactions), integra-
tion of various data sources such as user registration
information, and specializing generic data mining al-
gorithms to take advantage of the specic nature of
access log data.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Design and Analysis Of Algorithms : Seminar Report and PPT	seminar projects maker	1	1,315	21-09-2017, 12:04 PM Last Post: jaseela123
	Data Mining: What is Data Mining? Report	project girl	1	2,262	21-09-2017, 11:47 AM Last Post: jaseela123
	The Web Service Modeling Ontology (WSMO) ppt	seminar ideas	1	2,772	15-09-2017, 12:19 PM Last Post: jaseela123
	INCREMENTAL MINING USING FREQUENT PATTERN TREE	project topics	1	10,061,816	13-09-2017, 09:40 AM Last Post: jaseela123
	Information Processing Using Transient Dynamics of Semiconductor Lasers Subject	seminar projects maker	1	797	11-09-2017, 04:41 PM Last Post: jaseela123
	Usability of Semantic Web for Enhancing Digital Living Experience	seminar flower	1	2,695	11-09-2017, 04:39 PM Last Post: jaseela123
	Pattern Recognition and (Fuzzy Sets in Pattern Recognition) ppt	project girl	1	918	11-09-2017, 04:19 PM Last Post: jaseela123
	multiple parameter for web service	seminar ideas	1	2,371	09-09-2017, 09:27 AM Last Post: jaseela123
	Computer-Based Information System	seminar tips	1	1,021	06-09-2017, 01:00 PM Last Post: jaseela123
	Web Spoofing Seminar PPT	project girl	1	3,100	02-09-2017, 02:50 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.