26-07-2012, 12:52 PM
WEB-LOG analysis using clustering
WEB-LOG analysis.pptx (Size: 362.52 KB / Downloads: 28)
What is DATA MINIG ? ? ?
Data Mining is a process that uses a variety of data analysis tools discover patterns and relationships in data that may be used to make valid predictions.
Web Structure Mining
Creating a model of web organizatio.n
Classify web pages.
Create similarity measures between web pages.
Page Rank.
The Clever system.
Hyperlink induced topic search(HITS).
Web Content Mining
Extends work of search engine
Improves on traditional crawler technique
Use data mining for efficiency, effectiveness and scalability
Further divided into
Agent based approach
Database based approach
Text mining is/isn’t content mining
Crawlers
Personalization
Web Usage Mining
Applies mining on web usage data or weblogs or click stream data
Client perspective
Server perspective
Aid in personalization
Helps in evaluating quality and effectiveness
Preprocessing, pattern discovery and data structures
Clustering
Clustering is an unsupervised classification technique widely used for web usage mining with main objective to group a given collection of unlabeled objects into meaningful cluster.
For the web domain the objects are either web documents or reference of web documents or user visits.
Clustering of web documents is usually based on contain data and aims to determine documents with similar content..
Partitioning Algorithm…
Partitioning method construct a partition of a database D of n object into a set of K clusters
->min sum of squared distance.
Given a K ,find a partitioning of K clusters optimizes the chosen partitioning criterion
Global Optimal: exhaustively enumerate all partitions
Heuristic methods: k-means and k-medoids algorithms