An efficient sliced data algorithm design for personalized data protection

**study tips** · 20-05-2013, 04:12 PM

An efficient sliced data algorithm design for personalized data protection to prevent generalized losses and membership divulgence

.pdf

An efficient sliced data.pdf (Size: 591.44 KB / Downloads: 47)

Introduction

In recent years, for many kinds of structured data, including tabular, graph and item set data, data anonymization techniques have been subject of research. In this thesis, we present brief yet systematic review of several anonymization techniques such as generalization and bucketization, have been designed for privacy preserving microdata publishing. Recent work has shown that generalization loses considerable amount of information, especially for high-dimensional data. On the other hand, bucketization does not prevent membership disclosure. Whereas slicing preserves better data utility than generalization and also prevents membership disclosure. This focus on effective method that can be used for providing better data utility and can handle high-dimensional data. Several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving microdata publishing. Recent work has shown that generalization loses considerable amount of information, especially for high dimensional data. Bucketization, on the other hand, does not prevent membership disclosure and does not apply for data that do not have a clear separation between quasi-identifying attributes and sensitive attributes. We will present a novel technique called slicing, which partitions the data both horizontally and vertically. We show that slicing preserves better data utility than generalization and can be used for membership disclosure protection.

Comparison with Generalization

There are several types of recordings for generalization. The recoding that preserves the most information is local recoding. In local recoding, one first groups tuples into buckets and then for each bucket, one replaces all values of one attribute with a generalized value. Such a recoding is local because the same attribute value may be generalized differently when they appear in different buckets.We now show that slicing preserves more information than such a local recoding approach, assuming that the same tuple partition is used. We achieve this by showing that slicing is better than the following enhancement of the local recoding approach Another important advantage of slicing is its ability to handle high-dimensional data. By partitioning attributes into columns, slicing reduces the dimensionality of the data. Each column of the table can be viewed as a sub-tablewith a lower dimensionality. Slicing is also different from the approach of publishing multiple independent sub-tables in that these sub-tables are linked by the buckets in slicing.

Comparison with Bucketization

To compare slicing with bucketization, we first note that bucketization can be viewed as a special case of slicing The advantages of slicing over bucketization can be understood as follows. First, by partitioning attributes into more than two columns, slicing can be used to prevent membership disclosure. Second, unlike bucketization, which requires a clear separation of QI attributes and the sensitive attribute, slicing can be used without such a separation.

SLICING ALGORITHMS

This part will present an efficient slicing algorithm to achieve ℓ-diverse slicing. Given a microdata table T and two parameters c and ℓ, the algorithm computes the sliced table that consists of c columns and satisfies the privacy requirement of ℓ-diversity. This algorithm consists of three phases: attribute partitioning, column generalization, and tuple partitioning. It now describes the three phases.

Attribute Partitioning

Our algorithm partitions attributes so that highly-correlated attributes are in the same column. This is good for both utility and privacy. In terms of data utility, grouping highly-correlated attributes preserves the correlations among those attributes. In terms of privacy, the association of uncorrelated attributes presents higher identification risks than the association of highly-correlated attributes because the association of uncorrelated attribute values is much less frequent and thus more identifiable. Therefore, it is better to break the associations between uncorrelated attributes, in order to protect privacy.

PROBLEM DEFINITION

Two popular anonymization techniques are generalization and bucketization. Generalization replaces a value with a “less-specific but semantically consistent” value. Three types of encoding schemes have been proposed for generalization: global recoding, regional recoding, and local recoding. Global recoding has the property that multiple occurrences of the same value are always replaced by the same generalized value.
Bucketization first partitions tuples in the table into buckets and then separates the quasi-identifiers with the sensitive attribute by randomly permuting the sensitive attribute values in each bucket.

OBJECTIVES

We introduce a new technique for privacy preserving data publishing. It has several advantages when compared with generalization and bucketization. We develop an efficient algorithm for computing the sliced table that satisfies ℓ-diversity. Our algorithm partitions attributes into columns, applies column generalization, and partitions tuples into buckets. Attributes that are highly-correlated are in the same column; this preserves the correlations between such attributes.
We would like to point out a nice property of slicing that is important for privacy protection. In it, a tuple can potentially match multiple buckets, i.e., each tuple can have more than one matching buckets. This is different from previous work on generalization and bucketzation, where each tuple can belong to a unique equivalence-class (or bucket). In fact, it has been recognized that restricting a tuple in a unique bucket helps the adversary but does not improve data utility.

QUANTITATIVE METHODOLOGIES:

Quantitative methodologies allow researchers to evaluate within a more controlled context. These types of studies infer a tendency to "assign numbers to the data" gathered. Several different types of experiments are part of this larger methodology. These studies include statistical and correlational analysis, surveys and controlled experiments. Statistical and correlational analysis consists of analyzing the relation between multiple variables.

QUALITATIVE METHODOLOGIES:

Qualitative methodologies differ greatly from the quantitative model, as they seek to get information that will "reflect the content and meaning of an event or the perspective of an individual." Qualitative methodologies include interviews, observation, field research and questionnaires/surveys. Interviews, which may be structured or unstructured, are similar to the survey but are often more intensive in their search for detail. Observation is in the form of participation, as the researcher collects data from within the subject's world. Another form of participative study is fieldwork, where researchers can observe firsthand, take notes and later analyze the results. Questionnaires, like surveys, are blank forms that researchers request participants complete.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Design and Analysis Of Algorithms : Seminar Report and PPT	seminar projects maker	1	1,315	21-09-2017, 12:04 PM Last Post: jaseela123
	Ranked, Efficient and Secure Keyword search over encrypted cloud data PPT	seminar post	1	814	21-09-2017, 11:55 AM Last Post: jaseela123
	Data Mining: What is Data Mining? Report	project girl	1	2,262	21-09-2017, 11:47 AM Last Post: jaseela123
	COMPUTER-AIDED GEOMETRIC DESIGN AND COMPUTER GRAPHICS	seminar ideas	1	1,835	19-09-2017, 04:54 PM Last Post: jaseela123
	DEMONSTRATING DATAPOSSESSION AND UN CHEATABLE DATA TRANSFER	seminar flower	1	1,466	19-09-2017, 11:05 AM Last Post: jaseela123
	Processing of collected data PPT	seminar projects maker	1	718	15-09-2017, 12:48 PM Last Post: jaseela123
	Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data pdf	study tips	1	2,018	13-09-2017, 12:59 PM Last Post: jaseela123
	Green Computing for Efficient use of Energy and Electronic Waste Minimization Report	project girl	1	1,357	12-09-2017, 12:37 PM Last Post: jaseela123
	Blowfish Encryption Algorithm pdf	project girl	1	1,113	12-09-2017, 12:36 PM Last Post: jaseela123
	Data Warehouse Report	study tips	1	879	12-09-2017, 12:23 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.