20-05-2013, 04:12 PM
An efficient sliced data algorithm design for personalized data protection to prevent generalized losses and membership divulgence
An efficient sliced data.pdf (Size: 591.44 KB / Downloads: 47)
Introduction
In recent years, for many kinds of structured data, including tabular, graph and item set data, data anonymization techniques have been subject of research. In this thesis, we present brief yet systematic review of several anonymization techniques such as generalization and bucketization, have been designed for privacy preserving microdata publishing. Recent work has shown that generalization loses considerable amount of information, especially for high-dimensional data. On the other hand, bucketization does not prevent membership disclosure. Whereas slicing preserves better data utility than generalization and also prevents membership disclosure. This focus on effective method that can be used for providing better data utility and can handle high-dimensional data. Several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving microdata publishing. Recent work has shown that generalization loses considerable amount of information, especially for high dimensional data. Bucketization, on the other hand, does not prevent membership disclosure and does not apply for data that do not have a clear separation between quasi-identifying attributes and sensitive attributes. We will present a novel technique called slicing, which partitions the data both horizontally and vertically. We show that slicing preserves better data utility than generalization and can be used for membership disclosure protection.
Comparison with Generalization
There are several types of recordings for generalization. The recoding that preserves the most information is local recoding. In local recoding, one first groups tuples into buckets and then for each bucket, one replaces all values of one attribute with a generalized value. Such a recoding is local because the same attribute value may be generalized differently when they appear in different buckets.We now show that slicing preserves more information than such a local recoding approach, assuming that the same tuple partition is used. We achieve this by showing that slicing is better than the following enhancement of the local recoding approach Another important advantage of slicing is its ability to handle high-dimensional data. By partitioning attributes into columns, slicing reduces the dimensionality of the data. Each column of the table can be viewed as a sub-tablewith a lower dimensionality. Slicing is also different from the approach of publishing multiple independent sub-tables in that these sub-tables are linked by the buckets in slicing.
Comparison with Bucketization
To compare slicing with bucketization, we first note that bucketization can be viewed as a special case of slicing The advantages of slicing over bucketization can be understood as follows. First, by partitioning attributes into more than two columns, slicing can be used to prevent membership disclosure. Second, unlike bucketization, which requires a clear separation of QI attributes and the sensitive attribute, slicing can be used without such a separation.
SLICING ALGORITHMS
This part will present an efficient slicing algorithm to achieve ℓ-diverse slicing. Given a microdata table T and two parameters c and ℓ, the algorithm computes the sliced table that consists of c columns and satisfies the privacy requirement of ℓ-diversity. This algorithm consists of three phases: attribute partitioning, column generalization, and tuple partitioning. It now describes the three phases.
Attribute Partitioning
Our algorithm partitions attributes so that highly-correlated attributes are in the same column. This is good for both utility and privacy. In terms of data utility, grouping highly-correlated attributes preserves the correlations among those attributes. In terms of privacy, the association of uncorrelated attributes presents higher identification risks than the association of highly-correlated attributes because the association of uncorrelated attribute values is much less frequent and thus more identifiable. Therefore, it is better to break the associations between uncorrelated attributes, in order to protect privacy.
PROBLEM DEFINITION
Two popular anonymization techniques are generalization and bucketization. Generalization replaces a value with a “less-specific but semantically consistent” value. Three types of encoding schemes have been proposed for generalization: global recoding, regional recoding, and local recoding. Global recoding has the property that multiple occurrences of the same value are always replaced by the same generalized value.
Bucketization first partitions tuples in the table into buckets and then separates the quasi-identifiers with the sensitive attribute by randomly permuting the sensitive attribute values in each bucket.
OBJECTIVES
We introduce a new technique for privacy preserving data publishing. It has several advantages when compared with generalization and bucketization. We develop an efficient algorithm for computing the sliced table that satisfies ℓ-diversity. Our algorithm partitions attributes into columns, applies column generalization, and partitions tuples into buckets. Attributes that are highly-correlated are in the same column; this preserves the correlations between such attributes.
We would like to point out a nice property of slicing that is important for privacy protection. In it, a tuple can potentially match multiple buckets, i.e., each tuple can have more than one matching buckets. This is different from previous work on generalization and bucketzation, where each tuple can belong to a unique equivalence-class (or bucket). In fact, it has been recognized that restricting a tuple in a unique bucket helps the adversary but does not improve data utility.
QUANTITATIVE METHODOLOGIES:
Quantitative methodologies allow researchers to evaluate within a more controlled context. These types of studies infer a tendency to "assign numbers to the data" gathered. Several different types of experiments are part of this larger methodology. These studies include statistical and correlational analysis, surveys and controlled experiments. Statistical and correlational analysis consists of analyzing the relation between multiple variables.
QUALITATIVE METHODOLOGIES:
Qualitative methodologies differ greatly from the quantitative model, as they seek to get information that will "reflect the content and meaning of an event or the perspective of an individual." Qualitative methodologies include interviews, observation, field research and questionnaires/surveys. Interviews, which may be structured or unstructured, are similar to the survey but are often more intensive in their search for detail. Observation is in the form of participation, as the researcher collects data from within the subject's world. Another form of participative study is fieldwork, where researchers can observe firsthand, take notes and later analyze the results. Questionnaires, like surveys, are blank forms that researchers request participants complete.