Data Preparation for Mining World Wide Web Browsing Patterns

**seminar tips** · 10-12-2012, 06:14 PM

Data Preparation for Mining World Wide Web Browsing Patterns

.ppt

Data Preparation for Mining.ppt (Size: 393.5 KB / Downloads: 85)

Introduction

The WWW continues to grow at an astounding rate resulting in increase of complexity of tasks such as web site design, web server design and of simply navigating through a web site

An important input to these design tasks is analysis of how a web site is used. Usage information can be used to restructure a web site in order to better serve the needs of users of a site

Web usage mining is the application of data mining techniques to large web data repositories in order to produce results that can be used in these design tasks.

Some of the data mining algorithms that are commonly used in web usage mining are:
i) Association rule generation: Association rule mining techniques discover unordered correlations between items found in a database of transactions.
e.g. 45% of the visitors who accessed the CS home page also accessed Sanjay Madria’s home page
ii) Sequential Pattern generation: This is concerned with finding intertransaction patterns such that the presence of a set of items is followed by another item in the time-stamp ordered transaction set.

Browsing Behaviour Models

In some respects, web usage mining is the process of reconciling the web site developer’s view of how the site should be used with the way the users are actually browsing the site

Therefore the two inputs that are required for the web usage mining process are an encoding of the site developer’s view of browsing behavior and an encoding of the actual browsing behaviors

i)Developer’s model: The web site developer’s view of how the site should be used is inherent in the structure of the site
* each link between pages exists because the developer believes that the
pages are related in some way

* the content of the pages themselves provide information about how the
developer expects the site to be used

Hence, an integral step of preprocessing phase is the classifying of the site pages and extracting the site topology from the HTML files that make up the web site

Preprocessing

In order to improve performance and minimize network traffic, most web browsers cache the pages that have been requested. As a result, when a user hits a ‘back’ button, the cached page is displayed and the web server is not aware of the repeat page access
Proxy servers provide an intermediate level of caching and create even more problems with identifying site usage. In a web server log, all requests potentially represent more than one user. Also due to proxy server level caching, a single request from the server could actually be viewed by multiple users through an extended period of time

Session Identification & Path Completion

Session identification takes all of the page references for a given user in a log and breaks them up into user sessions
Problem: For logs that span long periods of time, it is very likely that users will visit the website more than once. The goal of session identification is to divide the page accesses of each user into individual sessions
Solution: The simplest method of achieving this is through a timeout, where if the time between page requests exceeds a certain limit (a time out of 25.5 minutes was established based on empirical data), it is assumed that the user is starting a new session
Path completion fills in page references that are missing due to browser and proxy server caching
Problem: To identify important accesses that are not recorded in the access log
Solution: If a page request is made that is not directly linked to the last page a user requested , the referrer log can be checked to see what page the request came from. If the page is in the user’s recent history, the assumption is that the user backtracked with the ‘back’ button, calling up cached versions of the pages until a new page was requested.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Ranked, Efficient and Secure Keyword search over encrypted cloud data PPT	seminar post	1	814	21-09-2017, 11:55 AM Last Post: jaseela123
	Data Mining: What is Data Mining? Report	project girl	1	2,262	21-09-2017, 11:47 AM Last Post: jaseela123
	DEMONSTRATING DATAPOSSESSION AND UN CHEATABLE DATA TRANSFER	seminar flower	1	1,466	19-09-2017, 11:05 AM Last Post: jaseela123
	Processing of collected data PPT	seminar projects maker	1	718	15-09-2017, 12:48 PM Last Post: jaseela123
	The Web Service Modeling Ontology (WSMO) ppt	seminar ideas	1	2,772	15-09-2017, 12:19 PM Last Post: jaseela123
	Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data pdf	study tips	1	2,018	13-09-2017, 12:59 PM Last Post: jaseela123
	INCREMENTAL MINING USING FREQUENT PATTERN TREE	project topics	1	10,061,816	13-09-2017, 09:40 AM Last Post: jaseela123
	Data Warehouse Report	study tips	1	879	12-09-2017, 12:23 PM Last Post: jaseela123
	Usability of Semantic Web for Enhancing Digital Living Experience	seminar flower	1	2,695	11-09-2017, 04:39 PM Last Post: jaseela123
	multiple parameter for web service	seminar ideas	1	2,371	09-09-2017, 09:27 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.