outsourced similarity search on metric data assets

abdulsaleem · 19-08-2012, 10:37 AM

i want how we implement
outsourced similarity search on metric data assets

26-02-2013, 08:17 PM

review of literature for outsourced similarity search on metric data assets

Reference: https://seminarproject.net/Thread-outsou...z2M161Iz2b

**study tips** · 28-06-2013, 03:41 PM

Outsourced Similarity Search on Metric Data Assets

.pdf

Outsourced Similarity.pdf (Size: 492.47 KB / Downloads: 31)

Abstract

This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service
provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the
most similar data objects to a query example. Outsourcing offers the data owner scalability and a low initial investment. The need for
privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential. Given this
setting, the paper presents techniques that transform the data prior to supplying it to the service provider for similarity queries on the
transformed data. Our techniques provide interesting trade-offs between query cost and accuracy. They are then further extended to
offer an intuitive privacy guarantee. Empirical studies with real data demonstrate that the techniques are capable of offering privacy
while enabling efficient and accurate processing of similarity queries.

INTRODUCTION

Advances in digital measurement and engineering technologies
enable the capture of massive amounts of data in fields such
as astronomy, medicine, and seismology. The effort for data
collection and processing, as well as its potential utility for research
or business, create value for the data owner. He wishes
to store them and allow access by himself, colleagues, and
other (trusted) scientists or customers. This can be supported
by outsourced servers that offer low storage costs for large
databases. For instance, outsourcing based on cloud computing
is becoming increasingly attractive, as it promises pay-as-yougo,
low storage costs as well as easy data access. However,
care needs to be taken to safeguard data that is valuable
or sensitive against unauthorized access. In this context, we
call any item in a data collection an object, individuals with
authorized access query users, and the entity offering the
storage service the service provider.

Shortcomings of existing methods.

In the literature, a number of concepts for securing databases
have been studied. Private information retrieval techniques
[15] hide the user’s query, e.g., the data item searched for, but
not the data being queried. To outsource valuable data to an
insecure server, such techniques are clearly not appropriate.
Digital watermarking [2] establishes the data owner’s identity
on the data. Additional information stored in the data helps
prove ownership, but it cannot prevent an attacker from
illegally copying the dataset. Anonymization techniques [25]
secure data by releasing only a generalized version. Aggregate
statistical analysis is still possible on the generalized data, but
the result of a specific query is not guaranteed to be accurate.

Indexing and NN Search in Metric Space

We review metric indexing because our proposed methods
provide metric indexing on the server for efficient processing.
The R*-tree [7] and the X-tree [8] are well-known diskbased
indexes for multi-dimensional objects, where each object
is modeled as a vector of coordinate values. Complex
data objects (e.g., DNA sequence, time series) cannot be effectively
represented by coordinate values. Instead, we model
them in metric space, where a (black-box) distance function
dist(pi; pj) is used to compute the dissimilarity between
objects pi and pj . The distance function dist() is said to be a
metric if it satisfies symmetry, non-negativity, and the triangle
inequality. Interested readers are referred to two excellent
surveys [10], [20] on metric space indexing. In this section,
we only describe three representative indexing methods for a
set P of metric space objects. They are the vantage-point tree
(VP-tree) [29], the multi-vantage-point (MVP-tree) [9], and
the M-tree [11].

Privacy and Security

The idea of outsourcing database services to a service provider
was introduced by Hacig¨um¨us et al. [18]. Since then, various
techniques have been developed to maintain the confidentiality
of outsourced data. Given a relational table, Hacig¨um¨us et
al. [17] map the tuples of the table into buckets and then
store the encrypted tuples of those buckets at the server. At
query time, the user compares the query object against the
description of those buckets, and then determines the necessary
buckets that need to be retrieved from the server. In another
proposal [12], the data owner applies the encryption function
on each node separately and then stores all encrypted tuples at
the server. The method of Agrawal et al. [3] employs an orderpreserving
function on 1D data values such that the distribution
of output values is different from that of input values.

Privacy Guarantee

In this section, we employ an intuitive obfuscation-based
privacy guarantee that can be adapted for metric data.
In the two-dimensional space, obfuscation [5] has been
used to represent an object’s location by a superset region
called the obfuscated region. An adversary without apriori
knowledge is unable to distinguish the object’s actual location
from other locations in the obfuscated region. The privacy
value is typically expressed as the area of the obfuscated
region in the two-dimensional space. However, for generic
metric space, there is only the concept of distance but not area.
Privacy thus means avoiding small distance between an object
and its obfuscated representation. We propose to obfuscate
an object p by using a ring (a; dist(a; p)) whose center is a
reference object a and radius is dist(a; p). This way, the object
cannot be distinguished from any other possible object (not
necessarily from the dataset) that have the same obfuscated
ring representation.

CONCLUSIONS

We proposed similarity search techniques for sensitive metric
data, e.g., bioinformatics data, that enable outsourcing of such
search. Existing solutions either offer query efficiency at no
privacy, or they offer complete data privacy while sacrificing
query efficiency. We introduce approaches that shift search
functionality to the server. The proposed Metric Preserving
Transformation (MPT) stores relative distance information at
the server with respect to a private set of anchor objects. This
method guarantees correctness of the final search result, but
at the cost of two rounds of communication. The proposed
Flexible Distance-based Hashing (FDH) methods finishes in
just a single round of communication, but does not guarantee
retrieval of the exact result.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	review of literature on fixed assets management	Guest	1	988	14-09-2017, 09:41 AM Last Post: jaseela123
	code used for the ranking model adaptation for domain specific search	Guest	1	1,507	11-09-2017, 12:09 PM Last Post: jaseela123
	smart remote data collection project	Guest	1	2,271	26-08-2017, 02:40 PM Last Post: jaseela123
	mini project on data structure using stack adt	Guest	4	10,564	05-09-2016, 09:45 AM Last Post: jaseela123
	ranking model adaptation for domain specific search ppt	Guest	12	8,771	20-07-2016, 10:52 AM Last Post: jaseela123
	secure data transmission java source code	Guest	2	3,978	05-07-2016, 10:20 AM Last Post: dhanabhagya
	vision based approach for deep web data extraction documentation	Guest	1	2,051	14-03-2016, 10:00 AM Last Post: mkaasees
	irctc data flow diagram	Guest	2	3,113	25-01-2016, 12:08 PM Last Post: Guest
	data structures and algorithms made easy by narasimha karumanchi pdf free download	Guest	3	8,944	11-12-2015, 06:33 PM Last Post: Guest
	privacy preserving multi keyword ranked search over encrypted cloud data ppt	Guest	5	5,858	25-09-2015, 02:06 PM Last Post: Guest

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.