25-08-2014, 03:56 PM
Integrity Verification of Multiple Data Copies over Untrusted Cloud Servers
Integrity.pdf (Size: 407.47 KB / Downloads: 18)
Abstract
—For an increased level of scalability, availability
and durability, some customers may want their data to be
replicated on multiple cloud servers. The more copies the cloud
service provider (CSP) is asked to store, the more fees the
customers are charged. In this paper, we propose a pairing based provable multi-copy data possession (PB-PMDP) scheme,
which provides an evidence to the customers that all outsourced
copies are actually stored and remain intact. Moreover, it
allows authorized users (i.e., those who have the right to access
the owner’s file) to seamlessly access the file copies stored
by the CSP, and supports public verifiability. The proposed
scheme is proved to be secure against colluding servers. We
illustrate the performance of the PB-PMDP scheme through
theoretical analysis, which is validated by experimental results.
The verification time of our scheme is practically independent
of the number of file copies. Additionally, we discuss how to
identify corrupted copies by slightly modifying the proposed
PB-PMDP scheme
I. INTRODUCTION
Outsourcing data to a remote cloud service provider (CSP)
allows organizations to store more data on the CSP than
on private computer systems. Such outsourcing also allows
authorized users to remotely access the data from different
geographic locations.
Once the data has been outsourced to a remote CSP,
which may not be trustworthy, the data owners lose the
direct control over their sensitive data. This lack of control
raises new formidable and challenging tasks related to data
confidentiality and integrity protection in cloud computing
systems. The confidentiality issue can be handled by encrypting sensitive data before outsourcing to remote servers.
As for data integrity, the data owners need to have a strong
evidence that the cloud servers still possess their data and
it is not being tampered with or partially deleted over time,
especially because the internal operation details of the CSP
may not be known to cloud customers. Consequently, many
researchers have focused on the problem of provable data
possession (PDP) and proposed different schemes to audit
the data on remote storage sites.
PDP is a technique for validating data integrity over
remote servers. In a typical PDP model, the data owner
generates some metadata/information for a data file to bused later for verification purposes through a challenge response protocol with the remote/cloud server. Researchers
have proposed different variations of PDP schemes under
different cryptographic assumptions; e.g., see [1]–[7].
PDP schemes presented in [1]–[7] focus on a single copy
of the file and provide no proof that the CSP stores multiple
copies of the owner’s file. More details and a comparative
analysis of various PDP schemes can be found in our
technical report [8].
Curtmla et al. [9] were the first to present a multiplereplica PDP (MR-PDP) scheme that creates multiple copies
of an owner’s file and audit them. The MR-PDP scheme
increases data availability; a corrupted data copy can be
reconstructed using duplicated copies on other servers. The
interaction between authorized users (users who have the
right to access the owner’s file) and the CSP was not
considered in [9]. The MR-PDP scheme supports private
verifiability, i.e., only the data owner can check data possession. Public verifiability is a key feature in remote data
checking schemes to avoid disputes that may occur between
the data owner and the CSP. Delegating the auditing process
(without revealing secret keys) to a trusted third party for
verifying the data integrity can solve such disputes.
Main contributions. Our contributions can be summarized
as follows:
• We propose a pairing-based provable multi-copy data
possession (PB-PMDP) scheme. This scheme provides
an adequate guarantee that the CSP stores all copies
that are agreed upon in the service contract, and these
copies are intact. The authorized users can seamlessly
access the copies received from the CSP. The PBPMDP scheme supports public verifiability.
• We justify the efficiency of the proposed PB-PMDP
scheme through performance analysis, experimental
results, and comparison with the MR-PDP model [9].
Moreover, we discuss a slight modification of the proposed PB-PMDP scheme to identify corrupted copies.
• We show the security of our scheme against colluding
servers.
that the verifier can reconstruct the entire data file from the
responses that are reliably transmitted from the server. This
is due to encoding of the data file, for example using erasure
codes, before outsourcing to remote servers. Various POR
schemes can be found in the literature, e.g., [10]–[15].
In the context of this work, we are considering
economically-motivated CSPs that may attempt to use less
storage than required by the service contract through deletion
of a few copies of the file. The CSPs have almost no financial
benefit by deleting only a small portion of a copy of the file.
More importantly, compared to erasure codes, duplicating
a data file across multiple servers offers a higher level of
scalability, which is crucial in cloud computing systems.
Besides, a server’s copy can be reconstructed even from a
complete damage using duplicated copies on other servers.
Paper organization
. The remainder of the paper is organized as follows. Section II reviews the MR-PDP scheme
of Curtmola et al. Section III contains our system and
assumptions. Section IV presents the proposed scheme. The
performance analysis and experimental results are shown in
Section V. How to identify the corrupted copies is discussed
in Section VI. Concluding remarks are given in Section VII.
II. MR-PDP SCHEME OF CURTMOLA ET AL.
The MR-PDP scheme of [9] is based on RSA. Initially,
a file F is fragmented into blocks {bj}1≤j≤m. The owner
encrypts F using a key K to obtain an encrypted version
Fe = {
˜bj}1≤j≤m, where ˜bj = EK(bj ). The owner generates
n distinct copies {Fbi}1≤i≤n, where Fbi = {
ˆbij}1≤j≤m,
ˆbij =
˜bj + rij (added as large integers in Z), and rij = fx(i|j).
fx is a pseudo-random function keyed with a secret key x.
In the MR-PDP scheme, if an authorized user interacts
with the CSP to access an owner’s file, the CSP retrieves
one of the available copies. Upon receiving this copy, the
authorized user has to know the copy number to properly
unmask it before decryption. Due to the opaqueness of the
internal operations of the CSP, the authorized users cannot
recognize which copy has been received. If i (the copy
number) is attached with each copy forming the structure
(i|Fbi), corrupting or swapping copy numbers hinder the
correct unmasking process. Thus, the authorized users are
unable to access the data file.
For the verification purpose, portion of the set {rij} is
needed to be generated. These random values cannot be
publicly known, otherwise the CSP can derive the encrypted
version Fe, and store only one copy. Hence, only private
verifiability is supported.
III. OUR SYSTEM AND ASSUMPTIONS
System components. The cloud computing storage model
considered in this work consists of three main components
as illustrated in Figure 1: (i) a data owner that can be an
individual or an organization originally possessing sensitive
The MR-PDP scheme of [9] is based on RSA. Initially,
a file F is fragmented into blocks {bj}1≤j≤m. The owner
encrypts F using a key K to obtain an encrypted version
Fe = {
˜bj}1≤j≤m, where ˜bj = EK(bj ). The owner generates
n distinct copies {Fbi}1≤i≤n, where Fbi = {
ˆbij}1≤j≤m,
ˆbij =
˜bj + rij (added as large integers in Z), and rij = fx(i|j).
fx is a pseudo-random function keyed with a secret key x.
In the MR-PDP scheme, if an authorized user interacts
with the CSP to access an owner’s file, the CSP retrieves
one of the available copies. Upon receiving this copy, the
authorized user has to know the copy number to properly
unmask it before decryption. Due to the opaqueness of the
internal operations of the CSP, the authorized users cannot
recognize which copy has been received. If i (the copy
number) is attached with each copy forming the structure
(i|Fbi), corrupting or swapping copy numbers hinder the
correct unmasking process. Thus, the authorized users are
unable to access the data file.
For the verification purpose, portion of the set {rij} is
needed to be generated. These random values cannot be
publicly known, otherwise the CSP can derive the encrypted
version Fe, and store only one copy. Hence, only private
verifiability is supported.
III. OUR SYSTEM AND ASSUMPTIONS
System components.
The cloud computing storage model
considered in this work consists of three main components
as illustrated in Figure 1: (i) a data owner that can be an
individual or an organization originally possessing sensitive
data to be stored in the cloud; (ii) a CSP who manages cloud
servers and provides paid storage space on its infrastructure
to store the owner’s files; and (iii) authorized users — a set
of owner’s clients who have the right to access the remote
data.. Such data are subject to infrequent change, so
we treat them as static.
Outsourcing and accessing.
The data owner has a file F
consisting of m blocks and the CSP offers to store n copies
{Fei}1≤i≤n of the owner’s file in exchange for pre-specified
fees metered in GB/month.
For data confidentiality, the owner encrypts his data before
outsourcing to the CSP. An authorized user of the outsourced
data sends a data-access request to the CSP and receives a
file copy in an encrypted form that can be decrypted using
a secret key shared with the owner. The authorized user is
unaware of which copy has been received.
Threat model.
The integrity of customers’ data in the cloud
may be at risk due to the following reasons
data to be stored in the cloud; (ii) a CSP who manages cloud
servers and provides paid storage space on its infrastructure
to store the owner’s files; and (iii) authorized users — a set
of owner’s clients who have the right to access the remote
data.
s. Such data are subject to infrequent change, so
we treat them as static.
Outsourcing and accessing. The data owner has a file F
consisting of m blocks and the CSP offers to store n copies
{Fei}1≤i≤n of the owner’s file in exchange for pre-specified
fees metered in GB/month.
For data confidentiality, the owner encrypts his data before
outsourcing to the CSP. An authorized user of the outsourced
data sends a data-access request to the CSP and receives a
file copy in an encrypted form that can be decrypted using
a secret key shared with the owner. The authorized user is
unaware of which copy has been received.
Threat model. The integrity of customers’ data in the cloud
may be at risk due to the following reasons
IV. PROPOSED PB-PMDP SCHEME
A. Overview and Rationale
Generating unique differentiable copies of the data file is
the core to design a multi-copy provable data possession
scheme. Identical data copies enable the CSP to simply
deceive the owner by storing only one copy and pretending
that it stores multiple copies. Using a simple yet efficient
way, the proposed scheme generates distinct copies utilizing
the diffusion property of any secure encryption scheme.
There will be an unpredictable complete change in the
ciphertext, if there is a single bit change in the plaintext.
The interaction between the authorized users and the CSP is
considered through this methodology of generating distinct
copies, where the former can decrypt and access a file copy
received from the CSP without recognizing the copy index.
Homomorphic linear authenticators (HLAs) [10], [14],
[16] are basic building blocks in the proposed scheme. We
utilize the BLS HLAs [10].
B. Notations
− F is a data file to be outsourced, and is composed of
a sequence of m blocks, i.e., F = {b1, b2, . . . , bm}.
− πkey(·) is a pseudo-random permutation (PRP): key ×
{0, 1}
log2
(m) → {0, 1}
log2
(m)
.
− ψkey(·) is a pseudo-random function (PRF): key ×
{0, 1}
∗ → Zp.
− Bilinear Map/Pairing. Let G1, G2, and GT be cyclic
groups of prime order p. Let g¯ and g be generators of
G1 and G2, respectively. A bilinear pairing is a map
eˆ : G1 × G2 → GT with bilinear, non-degenerate, and
computable properties [17].
− H(·) is a map-to-point hash function : {0, 1}
∗ → G1.
− EK is an encryption algorithm with strong diffusion
property, e.g., AES.
C. PB-PMDP Procedural Steps
Key Generation. Let eˆ : G1 × G2 → GT be a bilinear
map and g is a generator of G2. The data owner runs
the KeyGen algorithm to generate a private key x ∈ Zp
and a public key y = g
x ∈ G2 along with s elements
(u1, u2, . . . , us) ∈R G1.
Generation of Distinct Copies. The data owner runs
the CopyGen algorithm to create n differentiable
copies Fe = {Fei}1≤i≤n. The copy Fei
is generated by
concatenating a copy number i with the file F, then
encrypting using EK, i.e., Fei = EK(i|F). Fei
is
divided into blocks {
˜bij}1≤j≤m, and the block ˜bij is
further fragmented into s sectors {
˜bij1,
˜bij2, . . . ,
˜bijs},
i.e., the copy Fei = {
˜bijk}1≤j≤m
1≤k≤s
, where each sector
˜bijk ∈ Zp for some large prime p.
The authorized users need to keep only a single secret
key K. Later, when an authorized user receives a file
VI. IDENTIFYING CORRUPTED COPIES
The proof P = {σ, µ} generated by the CSP will be valid
and will pass the verification equation (1) only if all copies
are intact. Thus, when there is one or more corrupted copies,
the whole auditing procedure fails. To identify the corrupted
copies, a slightly modified version of the PB-PMDP scheme
can be used. In this version, Φ = {σij} 1≤i≤n
1≤j≤m
, i.e., no
tags aggregation. As a response, the CSP computes µ =
{µik}1≤i≤n
1≤k≤s
as before, but σ =
Y
(j,rj )∈Q
[
Yn
i=1
σij ]
rj ∈ G1.
The verifier first validates P using equation (1). If fails,
the verifier asks the CSP to send σ = {σi}1≤i≤n, where
σi =
Y
(j,rj )∈Q
σ
rj
ij . Thus, the verifier has two lists σList
= {σi}1≤i≤n and µList = {µik}1≤i≤n
1≤k≤s
(µList is a two
dimensional list).
Utilizing a recursive divide-and-conquer approach (binary
search), the verifier can identify the indices of corrupted
copies. Specifically, σList and µList are divided into halves:
σList→ (σLeft:σRight), and µList→ (µLeft:µRight). The
verification equaation (1) is applied recursively on σLeft with
µLeft and σRight with µRight. Note that the individual
tags in σLeft or σRight are aggregated via multiplication to
generate one σ that is used during the recursive application
of equation (1). This modification will be at the cost of some
extra storage, communication, and computation overheads.
VII. CONCLUSION
We have proposed a BP-PMDP scheme. The interaction
between the authorized users and the CSP is considered
in our scheme. Moreover, the proposed scheme supports
public verifiability, and allows unlimited number of auditing.
Security analysis is given in [8].
Through performance analysis, experimental results, and
comparison with the MR-PDP model, we have justified the
efficiency of the proposed scheme. The verification time of
PB-PMDP is practically independent of the number of file
copies.
A slight modification can be done on the proposed scheme
to identify the corrupted copies