06-03-2013, 10:02 AM
STUDY OF SEARCH ENGINES AND THERE ALGORITHM
STUDY OF SEARCH.doc (Size: 80 KB / Downloads: 24)
WHAT ARE SEARCH ENGINES?
Search engines are huge databases of web page files that have been assembled automatically by machine. A web search engine is software code that is designed to search for information on the World Wide Web. The search results are generally presented in a line of results often referred to as search engine results pages (SERP's). The information may be a specialist in web pages, images, information and other types of files. Some search engines also mine data available in databases or open directories. Unlike web directories, which are maintained only by human editors, search engines also maintain real-time information by running an algorithm on a web crawler.
There are two types of search engines:
• Individual. Individual search engines compile their own searchable databases on the web.
• Meta. Metasearchers do not compile databases. Instead, they search the databases of multiple sets of individual engines simultaneously (see Lesson 2).
HOW DO SEARCH ENGINES WORK?
Search engines compile their databases by employing "spiders" or "robots" ("bots") to crawl through web space from link to link, identifying and perusing pages. Sites with no links to other pages may be missed by spiders altogether. Web page owners may submit their URLs to search engines for "crawling" and eventual inclusion in their databases.
Whenever you search the web using a search engine, you're asking the engine to scan its index of sites and match your keywords and phrases with those in the texts of documents within the engine's database.
ARE SEARCH ENGINES ALL THE SAME?
Search engines use selected software programs to search their indexes for matching keywords and phrases, presenting their findings to you in some kind of relevance ranking. Although software programs may be similar, no two search engines are exactly the same in terms of size, speed and content; no two search engines use exactly the same ranking schemes, and not every search engine offers you exactly the same search options. Therefore, your search is going to be different on every engine you use. The difference may not be a lot, but it could be significant. Recent estimates put search engine overlap at approximately 60 percent and unique content at around 40 percent.
HOW DO SEARCH ENGINES RANK WEB PAGES?
In ranking web pages, search engines follow a certain set of rules. These may vary from one engine to another. Their goal, of course, is to return the most relevant pages at the top of their lists. To do this, they look for the location and frequency of keywords and phrases in the web page document and, sometimes, in the HTML META tags.
WHEN DO YOU USE SEARCH ENGINES?
Search engines are best at finding unique keywords, phrases, quotes, and information buried in the full-text of web pages. Because they index word by word, search engines are also useful in retrieving tons of documents.
NOTE: Today, the line between search engines and subject directories (see Lesson 3) is blurring. Search engines no longer limit themselves to a search mechanism alone.
High-level architecture of a standard Web crawler
When a user enters a query into a search engine (typically by using keywords), the engine examines its index and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text. The index is built from the information stored with the data and the method by which the information is indexed.[11] As early as 2007 the Google.com search engine has allowed one to search by date by clicking 'Show search tools' in the leftmost column of the initial search results page, and then selecting the desired date range.[citation needed] Most search engines support the use of the boolean operators AND, OR and NOT to further specify the search query. Boolean operators are for literal searches that allow the user to refine and extend the terms of the search. The engine looks for the words or phrases exactly as entered. Some search engines provide an advanced feature called proximity search, which allows users to define the distance between keywords.[11] There is also concept-based searching where the research involves using statistical analysis on pages containing the words or phrases you search for. As well, natural language queries allow the user to type a question in the same form one would ask it to a human. A site like this would be ask.com.