Several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving microdata publishing. Recent work has shown that generalization loses considerable amount of information, especially for high-dimensional data. Bucketization, on the other hand, does not prevent membership disclosure and does not apply for data that do not have a clear separation between quasi-identifying attributes and sensitive attributes. In this paper, we present a novel technique called slicing, which partitions the data both horizontally and vertically. We show that slicing preserves better data utility than generalization and can be used for membership disclosure protection. Another important advantage of slicing is that it can handle high-dimensional data. We show how slicing can be used for attribute disclosure protection and develop an efficient algorithm for computing the sliced data that obey the \ell-diversity requirement. Our workload experiments confirm that slicing preserves better utility than generalization and is more effective than bucketization in workloads involving the sensitive attribute. Our experiments also demonstrate that slicing can be used to prevent membership disclosure.
Introduction
Privacy-preserving data mining is the area of data mining that used to safeguard sensitive information from unsanctioned disclosure .The problem of privacy-preserving data mining has become more important in recent years because of the increasing ability to store personal data about users. A number of techniques such as randomization and kanonymity, bucketization, generlization have been proposed in recent years in order to perform privacy-preserving data mining. For high-dimension data by using generalization significant amount of information is lost according to recent works. Whereas the Bucketization technique does not forbid membership and does not applicable to the data that does not have a clear distinction between sensitive attributes and quasiidentifying attributes Thus, this paper shows a solution to preserve privacy of high dimensional data.