11-07-2012, 09:58 AM
Data Integrity Proofs in Cloud Storage
Data integrity proofs in cloud storage.pdf (Size: 148.78 KB / Downloads: 73)
Abstract
Cloud computing has been envisioned as the de-facto
solution to the rising storage costs of IT Enterprises. With the
high costs of data storage devices as well as the rapid rate at
which data is being generated it proves costly for enterprises
or individual users to frequently update their hardware. Apart
from reduction in storage costs data outsourcing to the cloud
also helps in reducing the maintenance. Cloud storage moves the
user’s data to large data centers, which are remotely located,
on which user does not have any control. However, this unique
feature of the cloud poses many new security challenges which
need to be clearly understood and resolved.
One of the important concerns that need to be addressed is to
assure the customer of the integrity i.e. correctness of his data in
the cloud. As the data is physically not accessible to the user the
cloud should provide a way for the user to check if the integrity
of his data is maintained or is compromised. In this paper we
provide a scheme which gives a proof of data integrity in the cloud
which the customer can employ to check the correctness of his
data in the cloud. This proof can be agreed upon by both the
cloud and the customer and can be incorporated in the Service
level agreement (SLA). This scheme ensures that the storage at
the client side is minimal which will be beneficial for thin clients.
I. INTRODUCTION
Data outsourcing to cloud storage servers is raising trend
among many firms and users owing to its economic advantages.
This essentially means that the owner (client) of the
data moves its data to a third party cloud storage server which
is supposed to - presumably for a fee - faithfully store the data
with it and provide it back to the owner whenever required.
As data generation is far outpacing data storage it proves
costly for small firms to frequently update their hardware
whenever additional data is created. Also maintaining the
storages can be a difficult task. Storage outsourcing of data
to a cloud storage helps such firms by reducing the costs
of storage, maintenance and personnel. It can also assure a
reliable storage of important data by keeping multiple copies
of the data thereby reducing the chance of losing data by
hardware failures.
Storing of user data in the cloud despite its advantages
has many interesting security concerns which need to be
extensively investigated for making it a reliable solution to the
problem of avoiding local storage of data. Many problems like
data authentication and integrity (i.e., how to efficiently and
securely ensure that the cloud storage server returns correct
and complete results in response to its clients’ queries [1]),
outsourcing encrypted data and associated difficult problems
dealing with querying over encrypted domain [2] were discussed
in research literature.
In this paper we deal with the problem of implementing a
protocol for obtaining a proof of data possession in the cloud
sometimes referred to as Proof of retrievability (POR).This
problem tries to obtain and verify a proof that the data
that is stored by a user at a remote data storage in the
cloud (called cloud storage archives or simply archives) is
not modified by the archive and thereby the integrity of the
data is assured. Such kinds of proofs are very much helpful
in peer-to-peer storage systems, network file systems, longterm
archives, web-service object stores, and database systems.
Such verification systems prevent the cloud storage archives
from misrepresenting or modifying the data stored at it without
the consent of the data owner by using frequent checks on
the storage archives. Such checks must allow the data owner
to efficiently, frequently, quickly and securely verify that the
cloud archive is not cheating the owner. Cheating, in this
context, means that the storage archive might delete some of
the data or may modify some of the data. It must be noted
that the storage server might not be malicious; instead, it
might be simply unreliable and lose or inadvertently corrupt
the hosted data. But the data integrity schemes that are to
be developed need to be equally applicable for malicious as
well as unreliable cloud storage servers. Any such proofs of
data possession schemes do not, by itself, protect the data
from corruption by the archive. It just allows detection of
tampering or deletion of a remotely located file at an unreliable
cloud storage server. To ensure file robustness other kind of
techniques like data redundancy across multiple systems can
be maintained.
While developing proofs for data possession at untrusted
cloud storage servers we are often limited by the resources
at the cloud server as well as at the client. Given that the
data sizes are large and are stored at remote servers, accessing
the entire file can be expensive in I/O costs to the storage
server. Also transmitting the file across the network to the
client can consume heavy bandwidths. Since growth in storage
capacity has far outpaced the growth in data access as well
as network bandwidth, accessing and transmitting the entire
978-1-4244-8953-4/11/$26.00
c 2011 IEEE archive even occasionally greatly limits the scalability of the
Fig. 1. Schematic view of a proof of retrievability based on inserting random sentinels in the data file F [3]
network resources. Furthermore, the I/O to establish the data
proof interferes with the on-demand bandwidth of the server
used for normal storage and retrieving purpose. The problem
is further complicated by the fact that the owner of the data
may be a small device, like a PDA (personal digital assist) or a
mobile phone, which have limited CPU power, battery power
and communication bandwidth. Hence a data integrity proof
that has to be developed needs to take the above limitations
into consideration. The scheme should be able to produce a
proof without the need for the server to access the entire file
or the client retrieving the entire file from the server. Also the
scheme should minimize the local computation at the client as
well as the bandwidth consumed at the client.
II. RELATED WORK
The simplest Proof of retrivability (POR) scheme can be
made using a keyed hash function hk(F). In this scheme the
verifier, before archiving the data file F in the cloud storage,
pre-computes the cryptographic hash of F using hk(F) and
stores this hash as well as the secret key K. To check if the
integrity of the file F is lost the verifier releases the secret key
K to the cloud archive and asks it to compute and return the
value of hk(F). By storing multiple hash values for different
keys the verifier can check for the integrity of the file F for
multiple times, each one being an independent proof.
Though this scheme is very simple and easily implementable
the main drawback of this scheme are the high resource
costs it requires for the implementation. At the verifier
side this involves storing as many keys as the number of checks
it want to perform as well as the hash value of the data file
F with each hash key. Also computing hash value for even a
moderately large data files can be computationally burdensome
for some clients(PDAs, mobile phones, etc ). As the archive
side, each invocation of the protocol requires the archive
to process the entire file F. This can be computationally
burdensome for the archive even for a lightweight operation
like hashing. Furthermore, it requires that each proof requires
the prover to read the entire file F - a significant overhead for
an archive whose intended load is only an occasional read per
file, were every file to be tested frequently[3].
Ari Juels and Burton S. Kaliski Jr proposed a scheme called
Proof of retrievability for large files using ”sentinels”[3]. In
this scheme, unlike in the key-hash approach scheme, only a
single key can be used irrespective of the size of the file or
the number of files whose retrievability it wants to verify. Also
the archive needs to access only a small portion of the file F
unlike in the key-has scheme which required the archive to
process the entire file F for each protocol verification. This
small portion of the file F is in fact independent of the length
of F. The schematic view of this approach is shown in Figure
1.
In this scheme special blocks (called sentinels) are hidden
among other blocks in the data file F. In the setup phase,
the verifier randomly embeds these sentinels among the data
blocks. During the verification phase, to check the integrity
of the data file F, the verifier challenges the prover (cloud
archive) by specifying the positions of a collection of sentinels
and asking the prover to return the associated sentinel values.
If the prover has modified or deleted a substantial portion of
F, then with high probability it will also have suppressed
a number of sentinels. It is therefore unlikely to respond
correctly to the verifier.To make the sentinels indistinguishable
from the data blocks, the whole modified file is encrypted and
stored at the archive. The use of encryption here renders the
sentinels indistinguishable from other file blocks. This scheme
is best suited for storing encrypted files.
As this scheme involves the encryption of the file F using a
secret key it becomes computationally cumbersome especially
when the data to be encrypted is large. Hence, this scheme
proves disadvantages to small users with limited computational
power (PDAs, mobile phones etc.). There will also be a
storage overhead at the server, partly due to the newly inserted
sentinels and partly due to the error correcting codes that are
inserted. Also the client needs to store all the sentinels with it,
which may be a storage overhead to thin clients (PDAs, low
power devices etc.).