DATA MINING

**mkaasees** · 22-09-2016, 12:44 PM

1455697501-PAPER.docx (Size: 523.45 KB / Downloads: 4)

Abstract: In recent times, due to the rapid usage of world wide web in a network, IP address are the information provider to the server manager to find out similar web pages (links) that are opened by users in a given session time.The availability of the data of web accessed is in human readable form generated by computer referred to as web log. Storing and retrieving the information from the log server is always a challenging task. Web mining is classified into three sub tasks such as, web content, web structure and web usage mining. This paper explains about the web usage mining based on server log files in an open source tool rapidminer.Here, the proposed work analyse the usage of web pages (i.e. browsing behaviour of user and browsing based on IP address) using two different clustering algorithms such as k-means, and random clustering which is incorporated in the tool rapid miner.

Introduction:

Data mining refers to extracting or mining useful knowledge from large amounts of data. It is the process of analyzing data from different perspectives and summarizing it into useful information. Users must able to collect the required information that flows through the Internet. Web Mining is the use of data mining techniques to automatically discover and extract information from Web documents and services. Web is the only source for user for extracting required information through hyperlinks. Web Mining is divided into three classes based on the information extracted shown in fig.1. Information extracted like video, audio, text, image which is known as Web Content Mining. Also the information extracted from structure of web pages which is known as Web Structure Mining. Web Usage Mining is used to analyze the web access by the users based on the IP address and to cluster them based on the IP address and web page similarity. To perform analysis web usage data must be collected from the Web server log files. Website statistics are based on server logs. A server log is a simple text file which records activity on the server.

RapidMiner Tool:

RapidMiner is a software platform developed by the RapidMiner Company provides an integrated environment for data mining, text mining, web mining, predictive analytics and business analytics. RapidMiner tool is used to analyze the web access information which is used for Web Usage Mining. It is open source licensed software which provides data mining, web mining including data extraction, transformation, loading (ETL), data pre-processing, and visualization. The analyzed results can be viewed in the form of scatter plot, Bar graph, Pie chart, Histograms etc. RapidMiner is written in Java Programming Language. The advantage with this tool is to analyze the result without any coding. The tool contains inbuilt Operators which are performing a single task within the process and the output of each operator forms the input of the next one. Different datasets can be imported into the tool

such as excel, arff, text documents, web server log files etc. RapidMiner functionality can be extended with additional plug-ins which is made available via RapidMiner Marketplace. Web Usage Mining can be performed by adding plug-in Web Mining in Marketplace of tool.

Pre-processing:
Real world data are generally incomplete such as lacking attribute values, lacking certain attributes of interest, or containing only aggregate data, Noisy like containing errors or outliers and inconsistent data like containing discrepancies in codes or names. Hear in web log server data need to be complete to form clusters based on IP address or similar user. So the first task is to clean data using Replace missing values operator which is inbuilt in RapidMiner tool which further can be clustered as per user requirement. Also by applying pre-processing technique the invalid attributes which are not required for web usage analysis like images, audio, video etc. can be removed.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	data mining full report	project report tiger	37	374,184,749	16-03-2019, 05:22 PM Last Post: TitkinWY
	A Novel Data Embedding Method Using Adaptive Pixel Pair Matching Report	project girl	3	4,489	15-01-2018, 01:56 PM Last Post: dhanabhagya
	Detecting False Data in Wireless Sensor Network using Efficient Becan Scheme	seminar tips	1	3,235	20-09-2017, 01:03 PM Last Post: jaseela123
	Different Initialization Data and the Performance by the BFM	seminar flower	1	680	20-09-2017, 12:44 PM Last Post: jaseela123
	Wide Area Mobile Data Services	seminar ideas	1	2,373	19-09-2017, 02:35 PM Last Post: jaseela123
	Integrating and Designing the Data Mining Technique System Based on Customer	seminar projects maker	1	782	15-09-2017, 02:45 PM Last Post: jaseela123
	Survey of Privacy Protection for Medical Data	project maker	1	649	13-09-2017, 01:14 PM Last Post: jaseela123
	Using Rapid Prototyping Data to Enhance a Knowledge-Based Framework for Product Redes	smart paper boy	1	115,120	13-09-2017, 09:54 AM Last Post: jaseela123
	Image Clustering and Retrieval using Image Mining Techniques REPORT	project girl	1	1,221	09-09-2017, 04:45 PM Last Post: jaseela123
	SECRET DATA HIDING IN IMAGE USING ENCRYPTION AND DECRYPTION KEY PPT	study tips	1	983	09-09-2017, 10:07 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.