25-08-2017, 09:32 PM
Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data
Enabling Secure and Efficient.pdf (Size: 929.9 KB / Downloads: 24)
Abstract
Cloud computing economically enables the paradigm of data service outsourcing. However, to protect data privacy,
sensitive cloud data have to be encrypted before outsourced to the commercial public cloud, which makes effective data utilization
service a very challenging task. Although traditional searchable encryption techniques allow users to securely search over encrypted
data through keywords, they support only Boolean search and are not yet sufficient to meet the effective data utilization need that is
inherently demanded by large number of users and huge amount of data files in cloud. In this paper, we define and solve the problem of
secure ranked keyword search over encrypted cloud data. Ranked search greatly enhances system usability by enabling search result
relevance ranking instead of sending undifferentiated results, and further ensures the file retrieval accuracy. Specifically, we explore
the statistical measure approach, i.e., relevance score, from information retrieval to build a secure searchable index, and develop a
one-to-many order-preserving mapping technique to properly protect those sensitive score information. The resulting design is able to
facilitate efficient server-side ranking without losing keyword privacy. Thorough analysis shows that our proposed solution enjoys “asstrong-
as-possible” security guarantee compared to previous searchable encryption schemes, while correctly realizing the goal of
ranked keyword search. Extensive experimental results demonstrate the efficiency of the proposed solution.
INTRODUCTION
CLOUD Computing is the long dreamed vision of
computing as a utility, where cloud customers can
remotely store their data into the cloud so as to enjoy the
on-demand high-quality applications and services from a
shared pool of configurable computing resources [2]. The
benefits brought by this new computing model include but
are not limited to: relief of the burden for storage management,
universal data access with independent geographical
locations, and avoidance of capital expenditure on hardware,
software, and personnel maintenances, etc., [3].
As Cloud Computing becomes prevalent, more and
more sensitive information are being centralized into the
cloud, such as e-mails, personal health records, company
finance data, and government documents, etc. The fact that
data owners and cloud server are no longer in the same
trusted domain may put the outsourced unencrypted data
at risk [4], [33]: the cloud server may leak data information
to unauthorized entities [5] or even be hacked [6].
PROBLEM STATEMENT
The System and Threat Model
We consider an encrypted cloud data hosting service
involving three different entities, as illustrated in Fig. 1:
data owner, data user, and cloud server. Data owner has a
collection of n data files C ¼ ðF1; F2; . . . ; FnÞ that he wants to
outsource on the cloud server in encrypted form while still
keeping the capability to search through them for effective
data utilization reasons. To do so, before outsourcing, data
owner will first build a secure searchable index I from a set
of m distinct keywords W ¼ ðw1; w2; . . . ; wmÞ extracted2
from the file collection C, and store both the index I and the
encrypted file collection C on the cloud server.
We assume the authorization between the data owner
and users is appropriately done. To search the file collection
for a given keyword w, an authorized user generates and
submits a search request in a secret form—a trapdoor Tw of
the keyword w—to the cloud server. Upon receiving the
search request Tw, the cloud server is responsible to search
the index I and return the corresponding set of files to the
user. We consider the secure ranked keyword search
problem as follows: the search result should be returned
according to certain ranked relevance criteria (e.g., keyword
frequency-based scores, as will be introduced shortly), to
improve file retrieval accuracy for users without prior
knowledge on the file collection C. However, cloud server
should learn nothing or little about the relevance criteria as
they exhibit significant sensitive information against keyword
privacy. To reduce bandwidth, the user may send an
optional value k along with the trapdoor Tw and cloud
server only sends back the top-k most relevant files to the
user’s interested keyword w.
Design Goals
To enable ranked searchable symmetric encryption for
effective utilization of outsourced and encrypted cloud data
under the aforementioned model, our system design should
achieve the following security and performance guarantee.
Specifically, we have the following goals: 1) Ranked
keyword search: to explore different mechanisms for
designing effective ranked search schemes based on the
existing searchable encryption framework; 2) Security
guarantee: to prevent cloud server from learning the
plaintext of either the data files or the searched keywords,
and achieve the “as-strong-as-possible” security strength
compared to existing searchable encryption schemes;
3) Efficiency: above goals should be achieved with minimum
communication and computation overhead.
THE DEFINITIONS AND BASIC SCHEME
In the introduction, we have motivated the ranked keyword
search over encrypted data to achieve economies of scale for
Cloud Computing. In this section, we start from the review
of existing searchable symmetric encryption schemes and
provide the definitions and framework for our proposed
ranked searchable symmetric encryption. Note that by
following the same security guarantee of existing SSE, it
would be very inefficient to support ranked search
functionality over encrypted data, as demonstrated in our
basic scheme. The discussion of its demerits will lead to our
proposed scheme.
Background on Searchable Symmetric
Encryption
Searchable encryption allows data owner to outsource his
data in an encrypted manner while maintaining the
selectively search capability over the encrypted data.
Generally, searchable encryption can be achieved in its full
functionality using an oblivious RAMs [16]. Although
hiding everything during the search from a malicious
server (including access pattern), utilizing oblivious RAM
usually brings the cost of logarithmic number of interactions
between the user and the server for each search
request. Thus, in order to achieve more efficient solutions,
almost all the existing works on searchable encryption
literature resort to the weakened security guarantee, i.e.,
revealing the access pattern and search pattern but nothing
else.
The Basic Scheme
Before giving our main result, we first start with a
straightforward yet ideal scheme, where the security of
our ranked searchable encryption is the same as previous
SSE schemes, i.e., the user gets the ranked results without
letting cloud server learn any additional information more
than the access pattern and search pattern. However, this is
achieved with the tradeoff of efficiency: namely, either
should the user wait for two round-trip time for each
search request, or he may even lose the capability to
perform top-k retrieval, resulting the unnecessary communication
overhead. We believe the analysis of these
demerits will lead to our main result. Note that the basic
scheme we discuss here is tightly pertained to recent work
[12], though our focus is on secure result ranking. Actually,
it can be considered as the most simplified version of
searchable symmetric encryption that satisfies the nonadaptive
security definition of [12].
Using Order Preserving Symmetric Encryption
The OPSE is a deterministic encryption scheme where the
numerical ordering of the plaintexts gets preserved by the
encryption function. Boldyreva et al. [14] gives the first
cryptographic study of OPSE primitive and provides a
construction that is provably secure under the security
framework of pseudorandom function or pseudorandom
permutation. Namely, considering that any order-preserving
function gðÞ from domain D ¼ f1; . . .;Mg to range
R ¼ f1; . . .;Ng can be uniquely defined by a combination
of M out of N ordered items, an OPSE is then said to be
secure if and only if an adversary has to perform a brute
force search over all the possible combinations of M out of
N to break the encryption scheme. If the security level is
chosen to be 80 bits, then it is suggested to choose M ¼
N=2 > 80 so that the total number of combinations will be
greater than 280. Their construction is based on an
uncovered relationship between a random order-preserving
function (which meets the above security notion) and the
hypergeometric probability distribution, which will later be
denoted as HGD. We refer readers to [14] for more details
about OPSE and its security definition.
One-to-Many Order-Preserving Mapping
Therefore, we have to modify the OPSE to suit our purpose.
In order to reduce the amount of information leakage from
the deterministic property, an one-to-many OPSE scheme is
thus desired, which can flatten or obfuscate the original
relevance score distribution, increase its randomness, and
still preserve the plaintext order. To do so, we first briefly
review the encryption process of original deterministic
OPSE, where a plaintext m in domain D is always mapped
to the same random-sized nonoverlapping interval bucket
in range R, determined by a keyed binary search over the
range R and the result of a random HGD sampling
function. A ciphertext c is then chosen within the bucket
by using m as the seed for some random selection function.
RELATED WORK
Searchable encryption. Traditional searchable encryption
[8], [9], [10], [11], [12], [24], [25] has been widely studied as a
cryptographic primitive, with a focus on security definition
formalizations and efficiency improvements. Song et al. [8]
first introduced the notion of searchable encryption. They
proposed a scheme in the symmetric key setting, where
each word in the file is encrypted independently under a
special two-layered encryption construction. Thus, a
searching overhead is linear to the whole file collection
length. Goh [9] developed a Bloom filter-based per-file
index, reducing the workload for each search request
proportional to the number of files in the collection. Chang
and Mitzenmacher [11] also developed a similar per-file
index scheme. To further enhance search efficiency,
Curtmola et al. [12] proposed a per-keyword-based
approach, where a single encrypted hash table index is
built for the entire file collection, with each entry consisting
of the trapdoor of a keyword and an encrypted set of related
file identifiers. Searchable encryption has also been considered
in the public-key setting. Boneh et al. [10] presented
the first public-key-based searchable encryption scheme,
with an analogous scenario to that of [8]. In their
construction, anyone with the public key can write to the
data stored on the server but only authorized users with the
private key can search.
CONCLUDING REMARKS
In this paper, as an initial attempt, we motivate and solve
the problem of supporting efficient ranked keyword search
for achieving effective utilization of remotely stored
encrypted data in Cloud Computing. We first give a basic
scheme and show that by following the same existing
searchable encryption framework, it is very inefficient to
achieve ranked search. We then appropriately weaken the
security guarantee, resort to the newly developed crypto
primitive OPSE, and derive an efficient one-to-many orderpreserving
mapping function, which allows the effective
RSSE to be designed. We also investigate some further
enhancements of our ranked search mechanism, including
the efficient support of relevance score dynamics, the
authentication of ranked search results, and the reversibility
of our proposed one-to-many order-preserving mapping
technique. Through thorough security analysis, we show
that our proposed solution is secure and privacy preserving,
while correctly realizing the goal of ranked keyword
search. Extensive experimental results demonstrate the
efficiency of our solution.