19-08-2012, 10:37 AM
i want how we implement
outsourced similarity search on metric data assets
19-08-2012, 10:37 AM
i want how we implement outsourced similarity search on metric data assets
26-02-2013, 08:17 PM
review of literature for outsourced similarity search on metric data assets
Reference: https://seminarproject.net/Thread-outsou...z2M161Iz2b
28-06-2013, 03:41 PM
Outsourced Similarity Search on Metric Data Assets Outsourced Similarity.pdf (Size: 492.47 KB / Downloads: 31) Abstract This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential. Given this setting, the paper presents techniques that transform the data prior to supplying it to the service provider for similarity queries on the transformed data. Our techniques provide interesting trade-offs between query cost and accuracy. They are then further extended to offer an intuitive privacy guarantee. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of similarity queries. INTRODUCTION Advances in digital measurement and engineering technologies enable the capture of massive amounts of data in fields such as astronomy, medicine, and seismology. The effort for data collection and processing, as well as its potential utility for research or business, create value for the data owner. He wishes to store them and allow access by himself, colleagues, and other (trusted) scientists or customers. This can be supported by outsourced servers that offer low storage costs for large databases. For instance, outsourcing based on cloud computing is becoming increasingly attractive, as it promises pay-as-yougo, low storage costs as well as easy data access. However, care needs to be taken to safeguard data that is valuable or sensitive against unauthorized access. In this context, we call any item in a data collection an object, individuals with authorized access query users, and the entity offering the storage service the service provider. Shortcomings of existing methods. In the literature, a number of concepts for securing databases have been studied. Private information retrieval techniques [15] hide the user’s query, e.g., the data item searched for, but not the data being queried. To outsource valuable data to an insecure server, such techniques are clearly not appropriate. Digital watermarking [2] establishes the data owner’s identity on the data. Additional information stored in the data helps prove ownership, but it cannot prevent an attacker from illegally copying the dataset. Anonymization techniques [25] secure data by releasing only a generalized version. Aggregate statistical analysis is still possible on the generalized data, but the result of a specific query is not guaranteed to be accurate. Indexing and NN Search in Metric Space We review metric indexing because our proposed methods provide metric indexing on the server for efficient processing. The R*-tree [7] and the X-tree [8] are well-known diskbased indexes for multi-dimensional objects, where each object is modeled as a vector of coordinate values. Complex data objects (e.g., DNA sequence, time series) cannot be effectively represented by coordinate values. Instead, we model them in metric space, where a (black-box) distance function dist(pi; pj) is used to compute the dissimilarity between objects pi and pj . The distance function dist() is said to be a metric if it satisfies symmetry, non-negativity, and the triangle inequality. Interested readers are referred to two excellent surveys [10], [20] on metric space indexing. In this section, we only describe three representative indexing methods for a set P of metric space objects. They are the vantage-point tree (VP-tree) [29], the multi-vantage-point (MVP-tree) [9], and the M-tree [11]. Privacy and Security The idea of outsourcing database services to a service provider was introduced by Hacig¨um¨us et al. [18]. Since then, various techniques have been developed to maintain the confidentiality of outsourced data. Given a relational table, Hacig¨um¨us et al. [17] map the tuples of the table into buckets and then store the encrypted tuples of those buckets at the server. At query time, the user compares the query object against the description of those buckets, and then determines the necessary buckets that need to be retrieved from the server. In another proposal [12], the data owner applies the encryption function on each node separately and then stores all encrypted tuples at the server. The method of Agrawal et al. [3] employs an orderpreserving function on 1D data values such that the distribution of output values is different from that of input values. Privacy Guarantee In this section, we employ an intuitive obfuscation-based privacy guarantee that can be adapted for metric data. In the two-dimensional space, obfuscation [5] has been used to represent an object’s location by a superset region called the obfuscated region. An adversary without apriori knowledge is unable to distinguish the object’s actual location from other locations in the obfuscated region. The privacy value is typically expressed as the area of the obfuscated region in the two-dimensional space. However, for generic metric space, there is only the concept of distance but not area. Privacy thus means avoiding small distance between an object and its obfuscated representation. We propose to obfuscate an object p by using a ring (a; dist(a; p)) whose center is a reference object a and radius is dist(a; p). This way, the object cannot be distinguished from any other possible object (not necessarily from the dataset) that have the same obfuscated ring representation. CONCLUSIONS We proposed similarity search techniques for sensitive metric data, e.g., bioinformatics data, that enable outsourcing of such search. Existing solutions either offer query efficiency at no privacy, or they offer complete data privacy while sacrificing query efficiency. We introduce approaches that shift search functionality to the server. The proposed Metric Preserving Transformation (MPT) stores relative distance information at the server with respect to a private set of anchor objects. This method guarantees correctness of the final search result, but at the cost of two rounds of communication. The proposed Flexible Distance-based Hashing (FDH) methods finishes in just a single round of communication, but does not guarantee retrieval of the exact result. |
|