01-05-2012, 11:59 AM
A Survey of Confidential Data Storage and Deletion Methods
A Survey of Confidential Data Storage and Deletion Methods.pdf (Size: 347.88 KB / Downloads: 44)
1. INTRODUCTION
As the cost of electronic storage declines rapidly, more and more sensitive data are stored on media such as
hard disks, CDs, and thumb drives. The push toward the paperless office also drives businesses toward
converting sensitive documents, once stored in locked filing cabinets, into digital forms. Today, an
insurance agent can carry a laptop that holds thousands of Social Security numbers, medical histories, and
other confidential information.
As early as 2003, the U.S. Census Bureau reported that two-thirds of American households have at least
one computer, with about one-third of adults using computers to manage household finances and make
online purchases [U.S. Census Bureau 2005]. These statistics suggest that many computers store data on
personal finances and online transactions, not to mention other confidential data such as tax records,
passwords for bank accounts, and email. We can estimate that these figures have risen dramatically since
the 2003 survey.
Sensitive information stored in an insecure manner is vulnerable to theft. According to the most recent
CSI Computer Crime and Security Survey [Richardson 2007], 50 percent of the respondents have been
victims of laptop and mobile theft in the last 12 months. These respondents include 494 security
practitioners in U.S. corporations, government agencies, financial institutions, medical institutions, and
universities. The survey also shows that between the years 2001 and 2007, the theft of electronic storage
occurred much more frequently than other forms of attack or misuse such as denial of service, telecom
fraud, unauthorized access to information, financial fraud, abuse of wireless network, sabotage, and Web
site defacement. Many incidences of theft continue to make headlines across many segments of the
society, including the government [Hines 2007], academia [Square 2007], the health industry [McNevin
2007; Sullivan 2005], and large companies [WirelessWeek 2007].
Two major components exist to safeguard the privacy of data on electronic storage media. First, data
must be stored in a confidential manner to prevent unauthorized access, and the solution should not impose
significant inconvenience during normal use. Second, at the time of disposal, confidential data must be
removed from the storage media as well as the overall computing environment in an irrecoverable manner.
While the fundamental goals are clear, existing solutions tend to evolve independently around each goal.
This survey summarizes the advantages and challenges of various confidential data storage and deletion
techniques, with the aim of identifying underlying trends and lessons to arrive at an overarching solution.
Due to the size of the problem, the focus of this survey is on non-distributed, single-user computing
environments (e.g., desktops and laptops). The user is assumed to have system administrative privileges to
configure the confidentiality settings of storage and deletion methods. The threat model assumes that
attacks to recover sensitive data are staged after the computer has been powered off: in other words, the
attacker uses “dead” forensic methods. Attacks that occur after the user is logged in (e.g., network-based
attacks or memory-based attacks) are beyond the scope of this survey.
2. SECURITY BACKGROUND
This section is designed for storage researchers and provides the relevant security concepts used when
comparing storage designs.
The general concept of secure handling of data is composed of three aspects: confidentiality, integrity,
and availability. Confidentiality involves ensuring that information is not read by unauthorized persons.
Using encryption to store data or authenticating valid users are example means by which confidentiality is
achieved. Integrity ensures that the information is not altered by unauthorized persons. Combining a
message authentication code with sensitive data is a way to verify integrity. Finally, availability ensures
that data is accessible when needed. Having multiple servers to withstand a malicious shutdown of a server
is one way to improve availability.
This survey compares various confidential storage and deletion approaches in terms of how each trades
confidentiality with convenience (e.g., ease-of-use, performance, and flexibility of setting security
policies). Both integrity and availability goals are assumed and are beyond the scope of this survey.
The strength of confidentiality is the result of how well secure storage and deletion mechanisms address
the following questions:
• If encryption is used, how well are the keys protected?
• Do copies of sensitive data reside at multiple places in the system?
• If encryption is used, how strong is the encryption mechanism and mode of operation in terms of the
computational efforts to subvert the encryption?
• Can deleted data be recovered? If so, what are the time and resource costs?
• Is the entire file securely deleted, or is some portion left behind (such as the file name or other
metadata)?
In other words, the confidential data must not be accessed by unauthorized persons after it is properly
stored or deleted.
The ease-of-use of an approach reflects the level of inconvenience imposed to end users. Methods of
confidential storage and deletion that are too hard to use will either encourage users to circumvent the
methods or discourage users from using them entirely [Whitten and Tygar 1999]. Some aspects of the user
model examined include:
• The number of times a person must enter an encryption key per session
• The ease with which the method is invoked
• The number of encryption keys or passwords a person or a system must remember
Performance is another form of convenience, as methods that either take too long or consume
unreasonable amounts of system resources will not be used. For both confidential storage and deletion,
performance can be measured by the latency and bandwidth of file access/erasure and overhead pertaining
to the encryption algorithm and the mode of operation used. Additionally, both methods can be measured
by the time taken per operation and total amount of system resources used.
Security policies are comprised of a set of rules, laws, and practices that regulate how an individual or
organization manages, distributes, and protects secure information. A policy may be specific to a person or
organization and may need to change frequently. This survey compares the flexibility of the method, or the
ease of configurations, with regards to the implementation of various confidential storage and deletion
policies. Some aspects that were examined include:
• Method compatibility with legacy applications and file systems
• The ease of key or password revocation
• How easily one may change the method’s configuration to fulfill a security policy (e.g., encryption
algorithm and key size)
• Whether one can control the granularity (e.g., file and disk partition) of confidential storage and
deletion operations
Many techniques of confidential storage and deletion involve cryptography. The following subsection
briefly introduces and compares commonly used encryption algorithms and their modes of operation.
2.1 Commonly Used Encryption Algorithms
Encryption is a procedure used in cryptography “to scramble information so that only someone knowing
the appropriate secret can obtain the original information (through decryption) [Kaufman et al. 2002].” The
secret is often a key of n random bits of zeros and ones, which can be derived through the use of a
password or passphrase. A key’s strength is often associated with the length of the key which, if it consists
of truly random bits, requires a brute-force enumeration of the key space to decrypt the original message.
An encryption algorithm, or cipher, takes a plaintext as input and produces encrypted text (i.e.,
ciphertext); similarly, a decryption algorithm takes a ciphertext as input and generates decrypted text (i.e.,
plaintext). Encryption algorithms can be either symmetric or asymmetric. Symmetric algorithms use the
same key for both encryption and decryption. Asymmetric algorithms use two keys: one for encryption and
another for decryption. For example, public-key cryptography uses two keys (public and private keys) and
is often used to establish secure communication across a network where there is no way to exchange a
symmetric key beforehand. Symmetric encryption schemes can be many times faster than comparable
asymmetric schemes, and are therefore used more often in secure data storage, especially when the data in
question does not traverse through an insecure network.
Common symmetric key encryption algorithms include the Data Encryption Standard (DES), Triple-
DES (3DES), and the Advanced Encryption Standard (AES). These algorithms are block ciphers, meaning
that they take a block of symbols of size n as input and output a block of symbols of size n. DES was
published in 1975 and developed as the U.S. standard for unclassified applications in 1977 [Stinson 2002].
DES uses a key size of 56 bits and a block size of 64 bits. The main criticism of DES today is that the 56-
bit key length is too short. With newer CPUs, the key space of 256 can be enumerated. Even with machines
in 1998, a machine called the “DES Cracker” could find a DES key in 56 hours.
Triple-DES was built to enlarge the DES key space without requiring users to switch to a new
encryption algorithm. 3DES operates by performing three DES operations on the data with three keys:
encryption with key one, decryption with key two, and encryption with key three. The three keys increase
the key space to 2168, but the strength of 3DES is only twice as strong as DES as demonstrated in the meetin-
the-middle attack [Chaum and Evertse 1985]. Unfortunately, performing three cryptographic operations
for every data access imposes a high performance penalty.
DES was replaced with the Advanced Encryption Standard (AES) algorithm in 2001. AES has a block
length of 128 bits and supports key lengths of 128, 192, and 256 bits. Among the five finalist algorithms to
be chosen as AES (MARS, RC6, Rijndael, Serpent, Twofish), Rijndael was chosen “because its
combination of security, performance, efficiency, implementability, and flexibility was judged to be
superior to the other finalists [Stinson 2002].” The National Security Agency (NSA) has reviewed and
concluded that all five finalists were secure enough for U.S. Government non-classified data. In 2003, the
U.S. government announced that AES, with all key lengths, is sufficient to protect classified information up
to the level of secret. Only AES key lengths of 192 or 256 bits can protect classified information at the top
secret level [Ferguson et al. 2001].