Introduction to Web Mining

**mkaasees** · 23-09-2016, 10:40 AM

1455801213-webMiningOverview1.ppt (Size: 1.07 MB / Downloads: 16)

What is Web Mining?

Discovering useful information from the World-Wide Web and its usage patterns
Applications
Web search e.g., Google, Yahoo,…
Vertical Search e.g., FatLens, Become,…
Recommendations e.g., Amazon.com
Advertising e.g., Google, Yahoo
Web site design e.g., landing page optimization

How does it differ from “classical” Data Mining?

The web is not a relation
Textual information and linkage structure
Usage data is huge and growing rapidly
Google’s usage logs are bigger than their web crawl
Data generated per day is comparable to largest conventional data warehouses
Ability to react in real-time to usage patterns
No human in the loop

The World-Wide Web

Huge
Distributed content creation, linking (no coordination)
Structured databases, unstructured text, semistructured
Content includes truth, lies, obsolete information, contradictions, …

Our modern-day Library of Alexandria

Size of the Web

Number of pages
Technically, infinite
Because of dynamically generated content
Lots of duplication (30-40%)
Best estimate of “unique” static HTML pages comes from search engine claims
Google = 8 billion, Yahoo = 20 billion
Lots of marketing hype
Number of unique web sites
Netcraft survey says 72 million sites

The web as a graph

Pages = nodes, hyperlinks = edges
Ignore content
Directed graph
High linkage
8-10 links/page on average
Power-law degree distribution

Power-laws galore

In-degrees
Out-degrees
Number of pages per site
Number of visitors
Let’s take a closer look at structure
Broder et al. (2000) studied a crawl of 200M pages and other smaller crawls
Bow-tie structure
Not a “small world”

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	data mining full report	project report tiger	37	374,184,749	16-03-2019, 05:22 PM Last Post: TitkinWY
	Web Application for College Automation ( WACA ) report	project girl	1	1,445	20-09-2017, 11:04 AM Last Post: jaseela123
	Integrating and Designing the Data Mining Technique System Based on Customer	seminar projects maker	1	782	15-09-2017, 02:45 PM Last Post: jaseela123
	Fragmentation Of Dynamic Web Pages	mechanical engineering crazy	1	13,943,942	13-09-2017, 04:11 PM Last Post: jaseela123
	Software Requirements Specification for Web Publishing System	study tips	1	972	12-09-2017, 10:34 AM Last Post: jaseela123
	web browser	subine	1	692	11-09-2017, 12:59 PM Last Post: jaseela123
	Image Clustering and Retrieval using Image Mining Techniques REPORT	project girl	1	1,221	09-09-2017, 04:45 PM Last Post: jaseela123
	Web-Based Information System for Blood Donation	seminar ideas	1	2,554	09-09-2017, 10:24 AM Last Post: jaseela123
	DEVELOPMENT OF WEB BASED ONLINE POLLING PPT	project girl	1	888	06-09-2017, 01:19 PM Last Post: jaseela123
	Software Defect Association Mining and Defect Correction Effort Prediction	project topics	1	342,050	02-09-2017, 03:21 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.