Web Spider

computer science topics · 15-06-2010, 04:46 PM

Abstract:
The objective was to build a customized multithreaded, focused crawler. This will crawl the web, based on the relevance of the web page, reducing thereby the crawl space and searching for the required page efficiently. The world-wide web, having over 350 million pages, continues to grow rapidly at an amazing pace of a million pages per day. About 600 GB of text changes every month. Such a tremendous growth and flux poses basic limits of scale for today's generic crawlers and search engines. In spite of using high-end multiprocessors and exquisitely crafted crawling software, the largest crawls cover only 30-40% of the web, and refreshes take weeks to a month. With such unprecedented scaling challenges for general-purpose crawlers and search engines, we propose a hypertext resource discovery system called a Focused Crawler. The goal of a focused crawler is to selectively seek out pages that are relevant to a pre-defined set of topics. The topics are specified not using keywords, but using exemplary documents

Softwares Used:
The feasibility of the developed software developed can be stated as under:
1.
The new application requires Net beans1(advanced compiler of Java).
2.
JDBC- ODBC bridge to create a link between Java application and data stored.

Hardware Used:
The present hardware setup of the organization may be sufficient for running the software applications to be developed. The hardware feasibility of the software developed is as under:
1.
Servers used to manage the application.
2.
High bandwidth for faster parsing of data.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Web Application for College Automation ( WACA ) report	project girl	1	1,445	20-09-2017, 11:04 AM Last Post: jaseela123
	Fragmentation Of Dynamic Web Pages	mechanical engineering crazy	1	13,943,942	13-09-2017, 04:11 PM Last Post: jaseela123
	Software Requirements Specification for Web Publishing System	study tips	1	972	12-09-2017, 10:34 AM Last Post: jaseela123
	web browser	subine	1	692	11-09-2017, 12:59 PM Last Post: jaseela123
	Web-Based Information System for Blood Donation	seminar ideas	1	2,554	09-09-2017, 10:24 AM Last Post: jaseela123
	DEVELOPMENT OF WEB BASED ONLINE POLLING PPT	project girl	1	888	06-09-2017, 01:19 PM Last Post: jaseela123
	Intelligent agents on the Web	seminar ideas	1	1,941	31-08-2017, 01:45 PM Last Post: jaseela123
	COLLABORATIVE WEB DATA EXTRACTION USING UNSUPERVISED METHOD	seminar ideas	1	1,911	31-08-2017, 09:12 AM Last Post: jaseela123
	Web-based Chat room Software	seminar topics	1	11,921,225	28-08-2017, 01:48 PM Last Post: jaseela123
	Shareglimpse-file Sharing Java Web Application	presentation Abstract	0	449	25-08-2017, 09:32 PM Last Post: presentation Abstract

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.