23-09-2014, 02:43 PM
Building Confidential and Efficient Query
Services in the Cloud with RASP Data
Perturbation
Building Confidential.pdf (Size: 571.5 KB / Downloads: 56)
Abstract
—With the wide deployment of public cloud computing infrastructures, using clouds to host data query services has
become an appealing solution for the advantages on scalability and cost-saving. However, some data might be sensitive that the
data owner does not want to move to the cloud unless the data confidentiality and query privacy are guaranteed. On the other
hand, a secured query service should still provide efficient query processing and significantly reduce the in-house workload to
fully realize the benefits of cloud computing. We propose the RASP data perturbation method to provide secure and efficien
t
range query and kNN query services for protected data in the cloud. The RASP data perturbation method combines order
preserving encryption, dimensionality expansion, random noise injection, and random projection, to provide strong resilience to
attacks on the perturbed data and queries. It also preserves multidimensional ranges, which allows existing indexing techniques
to be applied to speedup range query processing. The kNN-R algorithm is designed to work with the RASP range query algorithm
to process the kNN queries. We have carefully analyzed the attacks on data and queries under a precisely defined threat model
and realistic security assumptions. Extensive experiments have been conducted to show the advantages of this approach on
efficiency and security
Definitions and Notations
First, we establish the notations. For simplicity, we
consider only single database tables, which can be the
result of denormalization from multiple relations. A
database table consists of n records and d searchable
attributes. We also frequently refer to an attribute as
a dimension or a column, which are exchangeable in
the paper. Each record can be represented as a vector
in the multidimensional space, denoted by low case
letters. If a record x is d-dimensional, we say x ∈ R
d
,
where R
d means the d-dimensional vector space. A
table is also treated as a d × n matrix, with records
represented as column vectors. We use capital letters
to represent a table, and indexed capital letters, e.g.,
Xi
, to represent columns. Each column is defined
on a numerical domain. Categorical data columns
are allows in range query, which are converted to
numerical domains as we will describe in Section 3.
Range query is an important type of query for many
data analytic tasks from simple aggregation to more
sophisticated machine learning tasks. Let T be a table
and Xi
, Xj , and Xk be the real valued attributes in
T , and a and b be some constants. Take the counting
query for example. A typical range query looks like
RASP RANGE-QUERY PROCESSING
Based on the RASP perturbation method, we design
the services for two types of queries: range query and
kNN query. This section will dedicate to range query
processing. We will first show that a range query in
the original space can be transformed to a polyhedron
query in the perturbed space, and then we develop a
secure way to do the query transformation. Then, we
will develop a two-stage query processing strategy for
efficient range query processing