30-01-2013, 12:34 PM
A Survey of Confidential Data Storage and Deletion Methods
1A Survey of Confidential.pdf (Size: 336.5 KB / Downloads: 27)
Abstract
As the amount of digital data grows, so does the theft of sensitive data through the loss or misplacement of laptops, thumb drives,
external hard drives, and other electronic storage media. Sensitive data may also be leaked accidentally due to improper disposal or
resale of storage media. To protect the secrecy of the entire data lifetime, we must have confidential ways to store and delete data.
This survey summarizes and compares existing methods of providing confidential storage and deletion of data in personal computing
environments.
INTRODUCTION
As the cost of electronic storage declines rapidly, more and more sensitive data are stored on media such as
hard disks, CDs, and thumb drives. The push toward the paperless office also drives businesses toward
converting sensitive documents, once stored in locked filing cabinets, into digital forms. Today, an
insurance agent can carry a laptop that holds thousands of Social Security numbers, medical histories, and
other confidential information.
As early as 2003, the U.S. Census Bureau reported that two-thirds of American households have at least
one computer, with about one-third of adults using computers to manage household finances and make
online purchases [U.S. Census Bureau 2005]. These statistics suggest that many computers store data on
personal finances and online transactions, not to mention other confidential data such as tax records,
passwords for bank accounts, and email. We can estimate that these figures have risen dramatically since
the 2003 survey.
SECURITY BACKGROUND
This section is designed for storage researchers and provides the relevant security concepts used when
comparing storage designs.
The general concept of secure handling of data is composed of three aspects: confidentiality, integrity,
and availability. Confidentiality involves ensuring that information is not read by unauthorized persons.
Using encryption to store data or authenticating valid users are example means by which confidentiality is
achieved. Integrity ensures that the information is not altered by unauthorized persons. Combining a
message authentication code with sensitive data is a way to verify integrity. Finally, availability ensures
that data is accessible when needed. Having multiple servers to withstand a malicious shutdown of a server
is one way to improve availability.
Commonly Used Encryption Algorithms
Encryption is a procedure used in cryptography “to scramble information so that only someone knowing
the appropriate secret can obtain the original information (through decryption) [Kaufman et al. 2002].” The
secret is often a key of n random bits of zeros and ones, which can be derived through the use of a
password or passphrase. A key’s strength is often associated with the length of the key which, if it consists
of truly random bits, requires a brute-force enumeration of the key space to decrypt the original message.
An encryption algorithm, or cipher, takes a plaintext as input and produces encrypted text (i.e.,
ciphertext); similarly, a decryption algorithm takes a ciphertext as input and generates decrypted text (i.e.,
plaintext). Encryption algorithms can be either symmetric or asymmetric. Symmetric algorithms use the
same key for both encryption and decryption. Asymmetric algorithms use two keys: one for encryption and
another for decryption. For example, public-key cryptography uses two keys (public and private keys) and
is often used to establish secure communication across a network where there is no way to exchange a
symmetric key beforehand. Symmetric encryption schemes can be many times faster than comparable
asymmetric schemes, and are therefore used more often in secure data storage, especially when the data in
question does not traverse through an insecure network.
Traditional Modes of Operation
The operating mode of an encryption algorithm allows block ciphers to output messages of arbitrary length
or turns block ciphers into self-synchronizing stream ciphers, which generate a continuous key stream to
produce ciphertext of arbitrary length. For example, using AES alone, one may only input and output
blocks of 128 bits each. Using AES with a mode of operation for a block cipher, one may input and output
data of any length.
An initialization vector (IV) is commonly used with many block ciphers: it is a small, often random, but
non-secret value used to help introduce randomness into the block cipher. The IV is often used at the
beginning of the block cipher.
CONCLUSION
This survey took a look at the methods, advantages, and limitations of confidential storage and deletion
methods for electronic media in a non-distributed, single-user environment, with a dead forensic attack
model. We compared confidential data handling methods using characteristics associated with
confidentiality, policy, ease-of-use, and performance. Additionally, we discussed challenges such as harddisk
issues and the data lifetime problem, as well as the overall trends of various approaches. By compiling
experiences and constraints of various confidential storage and deletion techniques, we hope that
knowledge from research areas that have been evolving independently can cross disseminate, to form
solutions that are tolerant to a broader range of constraints.