08-01-2013, 03:51 PM
A Survey of Techniques For Answering Top-k queries
A Survey of Techniques.docx (Size: 136.64 KB / Downloads: 24)
Abstract
Top-k queries are useful in retrieving top-k recrds from a given set of records depending on the value of a function F on their attributes. Many techniques have been proposed in database literature for answering top-k queries.These are mainly categorized into three:Sorted-list based,layer based and View based. In first category, records are sorted along each dimension and then assigned a rank to each of the records using parallel scanning method.Threshold Algorithm(TA) and Fagin’s Algorithm(FA) are the examples of sorted-list based category. Second category is layer based category,in which all the records are organized into layers such as in onion technique and robust indexing technique.Third category includes methods such as PREFER and LPTA(Linear Programming Adaptation of Threshold Algorithm) and processing is based on the materialized views.
INTRODUCTION
Top-k queries are intended for retrieving top-k records from the database which are subjected to minimization or maximization of the function F on the attributes of the relation.This kind of queries appears frequently in many applications such as college ranking,job ranking etc.Due to the popularity of top-k queries, many techniques have been proposed which are mainly includes sorted-list based,layer based and view based techniques.
Sorted-list based
Methods in this category orts all records along each dimension and then assigned an overall grade to each of the
records based on the sorted lists.For example, consider the
example of college ranking.A student want to join a college
for doing graduation and he has some preferences based on
the attributes like distance to the college,tution fee,university under which college is working,performance of the college for
previous four years etc.He then assigns grades to each of the
attributes and sorted lists are created based on this assignment corresponding to each of the attributes.Then a list
of colleges have retrieved based on their value for the query function.Here, the query function is a linear function in terms of the attributes of the records.FA and TA [1],[2],[3] are the two techniques included in this category.
Layer Based Category
The algorithms in this category organize all records into consecutive layers, such as Onion [4] and Robust Indexing Techniques [5]. The organization strategy is based on the common property among the records, such as the same convex hull layer in Onion [4]. Any top-k query can be answered by up to k layers of records. The Onion indexing is based on a geometric property of convex hull, which guarantees that the optimal value can always be found at one or more of its vertices.
The Onion indexing makes use of this property to construct convex hulls in layers with outer layers enclosing inner layers geometrically. A data record is indexed by its layer number or equivalently its depth in the layered convex hull. Queries with linear weightings issued at run time are evaluated from the outmost layer inwards. Onion indexing achieves orders of magnitude speedup against sequential linear scan when N is small compared to the cardinality of the set. The Onion technique also enables progressive retrieval, which processes and returns ranked results in a progressive manner. Furthermore, the proposed indexing can be extended into a hierarchical organization of data to accommodate both global and local queries.
Onion Technique
This technique comes under the layer based category and uses a special indexing structure for answering top-k queries. The Onion indexing is based on a geometric property of convex hull, which guarantees that the optimal value can always be found at one or more of its vertices. The Onion indexing makes use of this property to construct convex hulls in layers with outer layers enclosing inner layers geometrically. A data record is indexed by its layer number or equivalently its depth in the layered convex hull. Queries with linear weightings issued at run time are evaluated from the outmost layer inwards.
Basic idea of the onion technique is that partition the collection of d-dimensional data points into sets that are optimally linearly ordered. This property is used to construct convex hulls in layers with outer layers enclosing inner layers geometrically.
Robust Indexing Structure
This is an another layered indexing structure useful for the evaluation of top-k queries. The idea of multi-layer indexing has been also adopted by to provide robust indexing[5],[10] for top-k queries. Robustness is defined in terms of providing the best possible performance in worst case scenario, which is fully scanning the first k layers to find the top-k answers. The main idea is that if each object Oi is pushed to the deepest possible layer, its retrieval can be avoided if it is unnecessary. This is accomplished by searching for the minimum rank of each object oi in all linear scoring functions. Such rank represents the layer number, denoted l*(Oi), where object Oi is pushed to. For n objects having d scoring predicates, computing the exact layer numbers for all objects has a complexity of O(nd log n), which is an overkill when n or d are large. Approximation is used to reduce the computation cost. An approximate layer number, denoted l(Oi), is computedsuch that l(Oi) • l*(Oi), which ensures that no false positives are produced in the top-k query answer.
CONCLUSION
A surevey of top-k query processing techniques based on the different criterias have done.For this purpose, a detailed analysis of different techniques included in three important categories like sorted-list based category,layer based category and view based category have explored.