09-02-2013, 11:43 AM
Privacy-Preserving OLAP: An Information-Theoretic Approach
Privacy-Preserving.pdf (Size: 1.22 MB / Downloads: 23)
Abstract
We address issues related to the protection of private information in Online Analytical Processing (OLAP) systems, where a
major privacy concern is the adversarial inference of private information from OLAP query answers. Most previous work on privacypreserving
OLAP focuses on a single aggregate function and/or addresses only exact disclosure, which eliminates from consideration
an important class of privacy breaches where partial information, but not exact values, of private data is disclosed (i.e., partial
disclosure). We address privacy protection against both exact and partial disclosure in OLAP systems with mixed aggregate functions.
In particular, we propose an information-theoretic inference control approach that supports a combination of common aggregate
functions (e.g., COUNT, SUM, MIN, MAX, and MEDIAN) and guarantees the level of privacy disclosure not to exceed thresholds
predetermined by the data owners. We demonstrate that our approach is efficient and can be implemented in existing OLAP systems
with little modification. It also satisfies the simulatable auditing model and leaks no private information through query rejections.
Through performance analysis, we show that compared with previous approaches, our approach provides more effective privacy
protection while maintaining a higher level of query-answer availability.
INTRODUCTION
ONLINE analytical processing (OLAP) is one of the most
popular decision support and knowledge discovery
techniques in business-intelligence systems [20]. However, it
is a challenge to enable OLAP on private data without
violating the data owners’ privacy. Traditional studies on
database security provide access control and data sanitization
(e.g., removal of personal-identifiable information) tools
for the relational back end of OLAP systems [19], [21], [25],
[31]. Nonetheless, since an essential feature of OLAP is to
compute the multidimensional aggregates of stored data, a
major privacy concern in OLAP is the adversarial inference
of (individual) private data points from the aggregate
information. This inference problem cannot be fully addressed
by access control and data sanitization techniques.
BACKGROUND
OLAP System Model
We consider an OLAP system where the data warehouse
server stores data in an n-dimensional data cube, in order to
support aggregate queries on an attribute-of-interest over
selected data [20]. We refer to such attribute-of-interest as the
measure attribute. Besides the measure attribute, there are n
dimension attributes, each of which is represented by a
dimension of the data cube. Each (base) cell of the data cube
is the value of the measure attribute for the corresponding
value combination of dimension attributes. For example,
Table 1 shows a two-dimensional data cube with a measure
attribute of sales and two dimension attributes of product
and time. Each cell of the data cube in Table 1 is the sales
amount (i.e., measure attribute) of a product (e.g., Book) in a
month (e.g., April). As in real cases, some cells in the data
cube can be missing or not applicable (i.e., N/A in Table 1).
Privacy Requirements
Due to privacy concerns, the owner of a data cube may not
want a user to have access to all the information stored in it.
Privacy requirements on a data cube may be defined over
the cells in the data cube, and/or the data tuples in the
relational back end of it. In this paper, we choose a cell as the
granularity for specification of privacy requirements due to
our focus on the processing of OLAP queries. Nonetheless,
as evidenced by our privacy measure in Section 3, privacy
requirements on tuples can also be transformed to requirements
on data cube cells. In our model, a user does not have
the right to access all cells in the data cube. We refer to all
cells that a user cannot access as sensitive cells for the user
SYSTEM MODEL
In this section, we introduce the system models for privacy
protection in OLAP. In particular, we present the system
settings, models of privacy intrusion attacks and defensive
countermeasures, performance measures for privacy disclosure
and query availability, as well as the formal
problem definition of inference control in OLAP.
System Settings
Let there be one data warehouse server and a number of
users C1; . . . ; CU in the system. For each user Ck (k 2 ½1; U),
let Gk be the set of cells in the data cube that Ck has the right
to access. We refer to Gk as the set of preknown cells of Ck.
Since the data warehouse server needs to properly enforce
access control on the sensitive cells, we assume that it
knows Gk for k 2 ½1; U as preknowledge.
INFERENCE CONTROL ALGORITHM
In this section, we present the detailed algorithm of our
information-theoretic inference control approach. In order to
help readers to better understand the methodology of our
proposed approach, we first show the algorithm on a simple
case of two-dimensional data cube in which each cell follows
a uniform distribution on f0; 1g and has a constant lðxÞ. After
that, we present a generic algorithm for n-dimensional data
cubes with cells of arbitrary distribution and heterogeneous
lðxÞ. At the end of this section, we shall show that both
algorithms satisfy the simulatable auditing model [23] and
disclose no private information through query rejections.
Simulatability of Algorithms A and B
We now show that both Algorithms A and B satisfy the
simulatable auditing model [23]. Note that either algorithm
makes the answer/reject decision for a received query q
based on q, the query history Qk, the preknown set Gk, the
prior distribution of the cells, and nothing else. Neither
algorithm uses the values of sensitive cells or the answer to
q as input. As such, all inputs to Algorithms A and B are
public information available to the users. Thus, the answer/
reject decisions made by the data warehouse server are fully
simulatable by the users. Due to the definition of simulatable
auditing [23], the auditor (i.e., the data warehouse
server) is simulatable with Algorithms A and B.