07-02-2013, 02:37 PM
RANKING WEBPAGES A Seminar Report
RANKING WEBPAGES.docx (Size: 206.34 KB / Downloads: 24)
ABSTRACT:
The World Wide Web consists billions of web pages and hugs amount of information available within web pages. To retrieve required information from World Wide Web, search engines perform number of tasks based on their respective architecture. When a user refers a query to the search engine, it generally returns a large number of pages in response to user’s query. To support the users to navigate in the result list, various ranking methods are applied on the search results. Most of the ranking algorithms which are given in the literature are either link or content oriented. Algorithm based on Visits of Links (VOL) is being devised for
search engines, which works on the basis of weighted page rank algorithm and takes number of visits of inbound links of web pages into account. The original Weighted Page Rank algorithm (WPR) is an extension to the standard Page Rank algorithm. WPR takes into account the importance of both the inlinks and outlinks of the pages and distributes rank scores based on the popularity of the pages. The proposed algorithm is used to find more relevant information according to user’s query. So, this concept is very useful to display most valuable pages on the top of the result list on the basis of user browsing behavior, which reduce the search space to a large scale.
INTRODUCTION:
The World Wide Web (Web) is popular and interactive medium to propagate information today. The Web is huge, diverse, dynamic, widely distributed global
information service center. As on today WWW is the largest information repository for knowledge reference. With the rapid growth of the Web, users get easily lost in
the rich hyperlink structure. Providing relevant information to the users to cater to their needs is the primary goal of website owners. Therefore, finding the content of the Web and retrieving the users’ interests and needs from their behavior have become increasingly important. When a user makes a query from ∑search engine, it generally returns a large number of pages in response to user queries. As user impose more number of relevant pages in the search result-list. To assist the users to navigate in the result list, various ranking methods are applied on the search results. The search engine uses these ranking methods to sort the results to be displayed to the user. In that way user can find the most important and useful result first. There are a variety of algorithms developed, few of them are PageRank, HITS, RANDOMZE HITS, SUBSPACE HITS, SIMRANK etc. As most of the ranking algorithms proposed are either link or content oriented in which consideration of user usage trends are not available. In this paper, a page ranking mechanism called Weighted PageRank Algorithm based on Visits of Links (VOL) is being devised for search engines, which works on the basis of Weighted PageRank Algorithm and takes number of visits of inbound links of web pages into account. The original Weighted PageRank algorithm (WPR) is an extension to the standard PageRank algorithm. WPR takes into account the importance of both the inlinks and outlinks of the pages and distributes rank scores based on the popularity of the pages. The main purpose of the proposed algorithm is finding more relevant information according to user’s query. So, this concept is very useful to display most valuable pages on the top of the result list on the basis of user browsing behavior, which reduce the search space to a large scale.
DATA MINING
Data Mining (sometimes called data or knowledge discovery) is the process of analyzing
data from different perspectives and summarizing it into useful information
The actual data mining task is the automatic or semi-automatic analysis of large quantities of
data to extract previously unknown interesting patterns such as groups of data records ( cluster analysis), unusual records (anomaly detection) and dependencies. This usually
involves using database techniques such as spatial indexes. These patterns can then be seen
as a kind of summary of the input data, and may be used in further analysis
WEB MINING:
Web mining - is the application of data mining techniques to discover patterns from the Web. According to analysis targets, web mining can be divided into three different types, which are Web usage mining, Web content mining and Web structure mining.
Web usage mining is the process of extracting useful information from server logs i.e. users history. Web usage mining is the process of finding out what users are looking for on Internet
Web content mining is the mining, extraction and integration of useful data, information and knowledge from Web page contents. The heterogeneity and the lack of structure that permeates much of the ever expanding information sources on the World Wide Web, such as hypertext documents, makes automated discovery, organization, and search and indexing tools of the Internet
What is PageRank?
PageRank is method adopted by a search engine for measuring a page's "importance." When all other factors such as Title tag and keywords are taken into account, it uses PageRank to adjust results so that sites that are deemed more "important" will move up in the results page of a user's search accordingly.
A basic overview of how an engine ranks pages in their search engine results pages (SERPS) follows:
1) Find all pages matching the keywords of the search.
2) Rank accordingly using "on the page factors" such as keywords.
3) Calculate in the inbound anchor text.
4) Adjust the results by PageRank scores.
PageRank is, without doubt, one of the hardest things for a Webmaster to manipulate ethically. However, it is possible to generate links to your site from other sites fairly simply through the use of link farms and guestbooks. Google frowns upon this kind of abuse, and many sites that have tried this have had their PageRank influence blocked. But it must be said that the abuse is still rampant, and that it can have an influence on PageRank. So, whilst not easy to do, PageRank is still subject to manipulation.
The extent to which PageRank is manipulated has also changed. Most people no longer believe Google’s old line of people not being able to influence PageRank and the results based on it. However, there is more information about PageRank available than ever, and people are more aware of manipulation techniques. So while PageRank is valuable, we should be careful not to over-estimate its usage and capabilities. The final ranking in Google is due to a mix of factors, of 6 which PageRank is only one. We’ll get into more details later by discussing how PageRank is different than the other ranking factors, and thus, when it applies and when it doesn’t. Ironically enough, PageRank’s weighting factor is undeniably declining. Since the original version of this paper gave out detailed information about PageRank, in all likelihood it may also have contributed in some small way to the decline in weighting of the very subject it talks about.
CONCLUSION:
1) The page ranking algorithms are tend to extract relevance of a page by analyzing the hyperlinks.
2) The algorithms are trying to extract some more information besides the keywords from user given query string.
3) The algorithms tend to use some web classification information during ranking.