Privacy-Preserving OLAP: An Information-Theoretic Approach pdf

**study tips** · 09-02-2013, 11:43 AM

Privacy-Preserving OLAP: An Information-Theoretic Approach

.pdf

Privacy-Preserving.pdf (Size: 1.22 MB / Downloads: 23)

Abstract

We address issues related to the protection of private information in Online Analytical Processing (OLAP) systems, where a
major privacy concern is the adversarial inference of private information from OLAP query answers. Most previous work on privacypreserving
OLAP focuses on a single aggregate function and/or addresses only exact disclosure, which eliminates from consideration
an important class of privacy breaches where partial information, but not exact values, of private data is disclosed (i.e., partial
disclosure). We address privacy protection against both exact and partial disclosure in OLAP systems with mixed aggregate functions.
In particular, we propose an information-theoretic inference control approach that supports a combination of common aggregate
functions (e.g., COUNT, SUM, MIN, MAX, and MEDIAN) and guarantees the level of privacy disclosure not to exceed thresholds
predetermined by the data owners. We demonstrate that our approach is efficient and can be implemented in existing OLAP systems
with little modification. It also satisfies the simulatable auditing model and leaks no private information through query rejections.
Through performance analysis, we show that compared with previous approaches, our approach provides more effective privacy
protection while maintaining a higher level of query-answer availability.

INTRODUCTION

ONLINE analytical processing (OLAP) is one of the most
popular decision support and knowledge discovery
techniques in business-intelligence systems [20]. However, it
is a challenge to enable OLAP on private data without
violating the data owners’ privacy. Traditional studies on
database security provide access control and data sanitization
(e.g., removal of personal-identifiable information) tools
for the relational back end of OLAP systems [19], [21], [25],
[31]. Nonetheless, since an essential feature of OLAP is to
compute the multidimensional aggregates of stored data, a
major privacy concern in OLAP is the adversarial inference
of (individual) private data points from the aggregate
information. This inference problem cannot be fully addressed
by access control and data sanitization techniques.

BACKGROUND

OLAP System Model

We consider an OLAP system where the data warehouse
server stores data in an n-dimensional data cube, in order to
support aggregate queries on an attribute-of-interest over
selected data [20]. We refer to such attribute-of-interest as the
measure attribute. Besides the measure attribute, there are n
dimension attributes, each of which is represented by a
dimension of the data cube. Each (base) cell of the data cube
is the value of the measure attribute for the corresponding
value combination of dimension attributes. For example,
Table 1 shows a two-dimensional data cube with a measure
attribute of sales and two dimension attributes of product
and time. Each cell of the data cube in Table 1 is the sales
amount (i.e., measure attribute) of a product (e.g., Book) in a
month (e.g., April). As in real cases, some cells in the data
cube can be missing or not applicable (i.e., N/A in Table 1).

Privacy Requirements

Due to privacy concerns, the owner of a data cube may not
want a user to have access to all the information stored in it.
Privacy requirements on a data cube may be defined over
the cells in the data cube, and/or the data tuples in the
relational back end of it. In this paper, we choose a cell as the
granularity for specification of privacy requirements due to
our focus on the processing of OLAP queries. Nonetheless,
as evidenced by our privacy measure in Section 3, privacy
requirements on tuples can also be transformed to requirements
on data cube cells. In our model, a user does not have
the right to access all cells in the data cube. We refer to all
cells that a user cannot access as sensitive cells for the user

SYSTEM MODEL

In this section, we introduce the system models for privacy
protection in OLAP. In particular, we present the system
settings, models of privacy intrusion attacks and defensive
countermeasures, performance measures for privacy disclosure
and query availability, as well as the formal
problem definition of inference control in OLAP.

System Settings

Let there be one data warehouse server and a number of
users C1; . . . ; CU in the system. For each user Ck (k 2 ½1; U),
let Gk be the set of cells in the data cube that Ck has the right
to access. We refer to Gk as the set of preknown cells of Ck.
Since the data warehouse server needs to properly enforce
access control on the sensitive cells, we assume that it
knows Gk for k 2 ½1; U as preknowledge.

INFERENCE CONTROL ALGORITHM

In this section, we present the detailed algorithm of our
information-theoretic inference control approach. In order to
help readers to better understand the methodology of our
proposed approach, we first show the algorithm on a simple
case of two-dimensional data cube in which each cell follows
a uniform distribution on f0; 1g and has a constant lðxÞ. After
that, we present a generic algorithm for n-dimensional data
cubes with cells of arbitrary distribution and heterogeneous
lðxÞ. At the end of this section, we shall show that both
algorithms satisfy the simulatable auditing model [23] and
disclose no private information through query rejections.

Simulatability of Algorithms A and B

We now show that both Algorithms A and B satisfy the
simulatable auditing model [23]. Note that either algorithm
makes the answer/reject decision for a received query q
based on q, the query history Qk, the preknown set Gk, the
prior distribution of the cells, and nothing else. Neither
algorithm uses the values of sensitive cells or the answer to
q as input. As such, all inputs to Algorithms A and B are
public information available to the users. Thus, the answer/
reject decisions made by the data warehouse server are fully
simulatable by the users. Due to the definition of simulatable
auditing [23], the auditor (i.e., the data warehouse
server) is simulatable with Algorithms A and B.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Software Crisis pdf	study tips	1	2,117	21-09-2017, 04:31 PM Last Post: jaseela123
	HOW EMAIL WORKS pdf	project girl	1	3,067	20-09-2017, 11:39 AM Last Post: jaseela123
	Cyber crime detection, investigation and prosecution pdf	seminar projects maker	1	958	20-09-2017, 11:31 AM Last Post: jaseela123
	Review: Context Aware Tools for Smart Home Development pdf	study tips	1	1,227	20-09-2017, 11:22 AM Last Post: jaseela123
	Getting Started with the MAXQ1103 Evaluation Kit and the CrossWorks Compiler pdf	project girl	1	969	15-09-2017, 03:11 PM Last Post: jaseela123
	Wireless Application Protocol (WAP) pdf	project girl	1	1,531	15-09-2017, 02:42 PM Last Post: jaseela123
	MAC Protocol for Reliable Multicast over Multi-Hop Wireless Ad Hoc Networks pdf	study tips	1	1,029	15-09-2017, 12:39 PM Last Post: jaseela123
	Wireless Automotive Communications pdf	seminar projects maker	1	637	14-09-2017, 01:27 PM Last Post: jaseela123
	Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data pdf	study tips	1	2,018	13-09-2017, 12:59 PM Last Post: jaseela123
	Internetworking connectionless and connection-oriented networks pdf	project girl	1	1,151	13-09-2017, 11:03 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.