ppt on Web Data Mining

**project girl** · 01-01-2013, 11:59 AM

Web Data Mining

.ppt

Web Data.ppt (Size: 85.5 KB / Downloads: 60)

Use of data mining techniques to automatically discover interesting and potentially useful information from Web documents and services.
Web mining may be divided into three categories:
1. Web content mining
2. Web structure mining
3. Web usage mining

Web details

More than 20 billion pages in 2008
Many more documents in databases accessible from the Web
More than 4m servers
A total of perhaps 100 terabytes
More than a million pages are added daily
Several hundred gigabytes change every month
Hyperlinks for navigation, endorsement, citation, criticism or plain whim

Graph terminology

Web is a graph – vertices and edges (V,E)
Directed graph – directed edges (p,q)
Undirected graph - undirected edges (p,q)
Strongly connected component - a set of nodes such that for any (u,v) there is a path from u to v
Breadth first search
Diameter of a graph
Average distance of the graph
Breadth first search - layer 1 consists of all nodes that are pointed by the root, layer k consists of all nodes that are pointed by nodes on level k-1
Diameter of a graph - maximum over all ordered pairs (u,v) of the shortest path from u to v

Citations

Lotka’s Inverse-Square Law - Number of authors publishing n papers is about 1/n2 of those with only one.
60% of all authors that make a single contribution.
Less than 1% publish 10 or more papers.
Most web pages are linked only to one other page (many not linked to any). Number of pages with multiple in-links declines quickly.
Rich get richer concept!

Web graph structure

Tendrils – cannot reach SCC and cannot be reached by it - about 20%
Unconnected – about 10%
The Web is hierarchical in nature. The Web has a strong locality feature. Almost two thirds of all links are to sites within the enterprise domain. Only one-third of the links are external. Higher percentage of external links are broken. The distance between local links tends to be quite small.

Web Content mining

Discovering useful information from contents of Web pages.
Web content is very rich consisting of textual, image, audio, video etc and metadata as well as hyperlinks.
The data may be unstructured (free text) or structured (data from a database) or semi-structured (html) although much of the Web is unstructured.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Human Computer Interface : Seminar Report and PPT	seminar post	1	1,337	22-09-2017, 11:23 AM Last Post: jaseela123
	4G Broadband : Seminar Report and PPT	study tips	1	1,261	22-09-2017, 11:19 AM Last Post: jaseela123
	Software Life-Cycle Models ppt	seminar flower	1	3,852	22-09-2017, 10:54 AM Last Post: jaseela123
	PPT ON LINUX	project girl	1	1,829	21-09-2017, 03:56 PM Last Post: jaseela123
	Public Key Infrastructure (Digital Certificates and Digital Signatures) PPT	project girl	1	2,364	21-09-2017, 01:18 PM Last Post: jaseela123
	Itanium Processor : Seminar Report and PPT	seminar projects maker	1	1,052	21-09-2017, 12:46 PM Last Post: jaseela123
	Design and Analysis Of Algorithms : Seminar Report and PPT	seminar projects maker	1	1,315	21-09-2017, 12:04 PM Last Post: jaseela123
	Ranked, Efficient and Secure Keyword search over encrypted cloud data PPT	seminar post	1	814	21-09-2017, 11:55 AM Last Post: jaseela123
	Data Mining: What is Data Mining? Report	project girl	1	2,262	21-09-2017, 11:47 AM Last Post: jaseela123
	Biometric Authentication PPT	project girl	1	1,109	19-09-2017, 02:32 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.