25-08-2017, 09:32 PM
Report on Web Search Engine
Report on Web.docx (Size: 20.62 KB / Downloads: 21)
INTRODUCTION
This program is inspired from Search Engine (a part of Google web services). Today the Google is the number 1 website for searching any kind of the information. In Google, it takes any kind of text data as input and shows all the possible results that almost match the entered phrase or the words from that phrase as list of websites hosting information on that as per a special ranking system.
What is a Search Engine?
A Web search engine is a tool designed to search for information on the World Wide Web. The search results are usually presented in a list and are commonly called hits. The information may consist of web pages, images, information and other types of files. Some search engines also mine data available in databases or open directories. Unlike Web directories, which are maintained by human editors, search engines operate algorithmically or are a mixture of algorithmic and human input.
Typically, a search engine works by sending out a spider to fetch as many documents as possible. Another program, called an indexer, then reads these documents and creates an index based on the words contained in each document. Each search engine uses a proprietary algorithm to create its indices such that, ideally, only meaningful results are returned for each query.
How does it work?
• Internet search engines are web search engines that search and retrieve information on the web. Most of them use crawler indexer architecture. They depend on their crawler modules. Crawlers also referred to as spiders are small programs that browse the web.
• Crawlers are given an initial set of URLs whose pages they retrieve. They extract the URLs that appear on the crawled pages and give this information to the crawler control module. The crawler module decides which pages to visit next and gives their URLs back to the crawlers.
• The topics covered by different search engines vary according to the algorithms they use. Some search engines are programmed to search sites on a particular topic while the crawlers in others may be visiting as many sites as possible.
• The crawl control module may use the link graph of a previous crawl or may use usage patterns to help in its crawling strategy.
• The indexer module extracts the words from each page it visits and records its URLs. It results into a large lookup table that gives a list of URLs pointing to pages where each word occurs. The table lists those pages, which were covered in the crawling process.
• A collection analysis module is another important part of the search engine architecture. It creates a utility index. A utility index may provide access to pages of a given length or pages containing a certain number of pictures on them.
• During the process of crawling and indexing, a search engine stores the pages it retrieves. They are temporarily stored in a page repository. Search engines maintain a cache of pages they visit so that retrieval of already visited pages expedites.
PROBLEM DEFINITION
But the above mentioned search engines do not provide some enhanced features. This project is intended for partial development of Enhanced Searching and responsive service.
Problem Statement:
• Today there isn’t any search engine other than some of the above mentioned ones that provide a better responsiveness to the User’s request for the result.
• Today with a large number of people depending on Internet technology for the help even to see how stuff works, etc. So there should be enhanced results which are otherwise best provided by Google only.
• The application should address some of the communities of people such as Handicapped. It also sometimes the case that we don’t know or are confused about some of the spellings such as kaleidoscope, etc.
• None of the search engines, not even Google support the one of the ignored but applicable aspect of searching, i.e., image searching with image as a source for searching.
• None of the Search Engine provides that it should at least first give a description of what is searched.
Proposed System:
• Our proposed system works on asp.net to make a database driven search engine which is presently the most powerful tool for website portals, etc. The system will have high-end GUI for the user interaction.
• The application will be a web technology based so that it can easily be integrated with the web and/or with GPRS mobiles.
• The application will use the concept of swings, and some of them can even use PWS/IIS for better performance with their Windows 98/NT/2000.
• This application uses a voice interface to improve the user searching experience as we are tend to use more and more techie applications. Here it takes sound / voice as input.
• This project also helps in searching the information using input as image which is the most important aspect that isn’t implemented till today.
• This software also first provides a basic definition or description of what the input is and then the URLs as the list.