Crawling the Hidden Web

**seminar ideas** · 26-07-2012, 03:13 PM

Crawling the Hidden Web

.ppt

CrawlingtheHiddenWeb.ppt (Size: 644 KB / Downloads: 27)

Web Crawlers

Automatically traverse the Web graph, building a local repository of the portion of the Web that they visit
Traditionally, crawlers have only targeted a portion of the Web called the publicly indexable Web (PIW)
PIW – the set of pages reachable purely by following hypertext links, ignoring search forms and pages that require authentication

The Hidden Web

Recent studies show that a significant fraction of Web content in fact lies outside the PIW
Large portions of the Web are ‘hidden’ behind search forms in searchable databases
HTML pages are dynamically generated in response to queries submitted via the search forms
Also referred as the ‘Deep’ Web

Deep Web Stats

The Deep Web is 500 times larger than PIW !!!
Contains 7,500 terabytes of information (March 2000)
More than 200,000 Deep Web sites exist
Sixty of the largest Deep Web sites collectively contain about 750 terabytes of information
95% of the Deep Web is publicly accessible (no fees)
Google indexes about 16% of the PIW, so we search about 0.03% of the pages available today

The Solution

Build a hidden Web crawler
Can crawl and extract content from hidden databases
Enable indexing, analysis, and mining of hidden Web content
The content extracted by such crawlers can be used to categorize and classify the hidden databases

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	The Web Service Modeling Ontology (WSMO) ppt	seminar ideas	1	2,772	15-09-2017, 12:19 PM Last Post: jaseela123
	Usability of Semantic Web for Enhancing Digital Living Experience	seminar flower	1	2,695	11-09-2017, 04:39 PM Last Post: jaseela123
	multiple parameter for web service	seminar ideas	1	2,371	09-09-2017, 09:27 AM Last Post: jaseela123
	Web Spoofing Seminar PPT	project girl	1	3,100	02-09-2017, 02:50 PM Last Post: jaseela123
	The Web	project girl	1	1,675	02-09-2017, 01:45 PM Last Post: jaseela123
	Packet Route Tracer of Web Request PPT	study tips	1	1,560	29-08-2017, 11:36 AM Last Post: jaseela123
	Report on Web Search Engine	project girl	1	676	28-08-2017, 02:54 PM Last Post: jaseela123
	WEB SPOOFING A SEMINAR REPORT	Computer Science Clay	0	20,769,278	25-08-2017, 09:32 PM Last Post: Computer Science Clay
	Seaside, a very cool framework to develop web application	computer science crazy	0	11,080,378	25-08-2017, 09:32 PM Last Post: computer science crazy
	Mashup (web application hybrid)	computer science crazy	0	14,753,733	25-08-2017, 09:32 PM Last Post: computer science crazy

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.