12-12-2012, 03:01 PM
Domain-Specific Search Engines
Domain-Specific.docx (Size: 1.74 MB / Downloads: 34)
INTRODUCTION
Domain-specific search engines are becoming widespread on the Web. Their focus on a single domain gives them greater capabilities and credibility over general-purpose search engines, but also limits their query answering spectrum to narrow queries over specific domain knowledge. The new trend in search-oriented research aims to overcome such a limitation, by allowing the co-operation and orchestration of search services in order to answer complex, multi-domain queries multi-domain query written in an SQL-like language that retrieves the top 5 hotels and French restaurants in Paris located in the same street giving priority to number of stars and user rating. Addressing such queries, called rank join queries, requires to i) extract the answers of domain-specific search systems, ii) join them to form combinations, and iii) globally rank each combination by means of an aggregation function, so as to return the answers with the top aggregate scores first. As customary in top-k query processing, we consider services that may return a large number of tuples, all equipped with a score. The tuples are typically presented in pages, and accessing such pages is costly. Moreover, accesses can occur according to various methods.
LITERATURE SURVEY
Top-k query answering is a topic that has been addressed by a large body of research during the last years .The problem originates from rank aggregation, where all relations contain the same objects, sorted in different orders. The objective function is inspired to Fagin’s algorithm, which first performs sorted accesses to find at least k matching objects in the input relations, and then proceeds with random accesses to return the top-k objects. The threshold algorithm (TA) has a finer stopping condition based on an upper bound. TA is instance optimal (for monotone functions), i.e., it always achieves a cost within a fixed constant factor of optimal. In many practical scenarios, access costs may be significantly skewed; suitable cost models and pulling strategies for rank aggregation have been proposed. In particular, in a cost model is proposed to guide the pulling strategy, in such a way that the frequency of random accesses is set equal to the ratio between random and sorted access costs.
SYSTEM ANALYSIS
Systems analysis is the study of sets of interacting entities, including computer systems analysis. This field is closely related to requirements analysis or operations research. It is also "an explicit formal inquiry carried out to help someone (referred to as the decision maker) identify a better course of action and make a better decision.
System Analysis is used in every field where there is a work of developing something. Analysis can also be defined as a series of components that perform organic function together in computer world, in this stage a statement of the problem is formulated and a model is build by the analyst in encouraging real-world situation. This phase show the important properties associated with the situation. Actually, the analysis model is a concise, precise abstraction and agreement on how the desired system must be developed. You can say that, here the objective is to provide a model that can be understood and criticized by any application experts in the area whether the expert is a programmer or not.
EXISTING SYSTEM
Top-k query answering is a topic that has been addressed by a large body of research during the last years. The problem originates from rank aggregation, where all relations contain the same objects, sorted in different orders. The objective function in is inspired to Fagin’s algorithm, which first performs sorted accesses to find at least k matching objects in the input relations, and then proceeds with random accesses to return the top-k objects. The threshold algorithm has a finer stopping condition based on an upper bound. The HRJN (hash rank join) operator is introduced in and shown there to outperform by a large margin.
Problem Statement: Focus on rank aggregation and top-k selection queries, thus cannot be readily extended to the rank join setting. It always achieves a cost within a fixed constant factor of optimal. Fagin’s combined algorithm cannot be directly extended to the case of rank join.
PROPOSED SYSTEM
We address the problem of joining ranked results produced by two or more services on the Web. We consider services endowed with two kinds of access that are often available: i) sorted access, which returns tuples sorted by score; ii) random access, which returns tuples matching a given join attribute value. Rank join operators combine objects of two or more relations and output the k combinations with the highest aggregate score. While the past literature has studied suitable bounding schemes for this setting, in this project we focus on the definition of a pulling strategy, which determines the order of invocation of the joined services. We propose the CARS pulling strategy, which is derived at compile-time and is oblivious of the query-dependent score distributions. We cast CARS as the solution of an optimization problem based on a small set of parameters characterizing the joined services. We validate the proposed strategy with experiments on both real and synthetic data sets. We show that CARS outperforms prior proposals and that its overall access cost is always within a very short margin from that of an oracle-based optimal strategy.
SYSTEM REQUIREMENTS
System requirements give an idea about what are the necessary things that are needed for the proposed system, which plays a very important role in the development of the system. This chapter deals with what are the hardware requirements that are needed for the system, application software that is required for the development of the system. Frontend tools help to visualize the system while the backend helps in activities which are not visible to end user.
FEASIBILITY STUDY
The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company. For feasibility analysis, some understanding of the major requirements for the system is essential.
Three key considerations involved in the feasibility analysis are
• ECONOMICAL FEASIBILITY
• TECHNICAL FEASIBILITY
• SOCIAL FEASIBILITY
Economical feasibility
Economical feasibility determines whether there are sufficient benefits in creating to make the cost acceptable, or is the cost of the system too high. As this signifies cost-benefit analysis and savings. On the behalf of the cost-benefit analysis, the proposed system is feasible and is economical regarding its pre-assumed cost for making a system. We classified the costs according to the phase in which they occur. As we know that the system development costs are usually one-time costs that will not recur after the project has been completed. For calculating the Development costs we evaluated certain cost categories viz.