Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

PRIVACY PRESERVING MICRO-DATA AND MEMBERSHIP DATA IN HIGH DIMENSIONAL DB USING EXTENDED SLICING TECHNIQUE

[attachment=52336]

Abstract:

Data and Database is increasing enormously these days such as Facebook, Twitter and Social networks. To protect Micro-Data and Membership Data in High Dimensional Databases in the servers or Data centers is very difficult. Several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving micro data publishing. The existing system explains the generalization loses considerable amount of information, especially for high dimensional data. Bucketization, on the other hand, does not prevent membership disclosure and does not apply for data that do not have a clear separation between quasi-identifying attributes and sensitive attributes. To avoid these disadvantages the proposed novel technique is called Extended Slicing technique, which partitions the data both horizontally and vertically. The Extended slicing preserves better data utility than generalization and can be used for membership disclosure protection. Another important advantage of Extended slicing is that it can handle high-dimensional data. How extended slicing can be used for attributes disclosure protection and develop an efficient algorithm for computing the sliced data that obey the diversity requirement. The extended slicing preserves better utility than generalization and are more effective than bucketization in workloads involving the sensitive attribute and to prevent membership disclosure

INTRODUCTION

Information is today probably the most important and demanded resource. The wide availability of personal data has made the problem of privacy preserving data mining an important one. Now a days sharing of information is very difficult such as in private public and governmental sectors. Many sectors requires only specific data,the specific data available only in the form of records. Privacy preserving publishing of microdata has been studied extensively in recent years. Microdata contains records each of which contains information about an individual entity, such as a person, a household, or an organization. Several microdata anonymization techniques have been proposed. The most popular ones are generalization for k-anonymity and bucketization for l-diversity . In both generalization and bucketization, one first removes identifiers from the data and then partitions tuples into buckets.

TECHNOLOGIES

Generalization

Our ﬁrst approach to providing k-anonymity is based on the deﬁnition and use of generalization relationships between domains and between values that attributes can assume. In relational database systems, a domain (e.g., integer, string, date) is associated with each attribute to indicate the set of values that the attribute can assume. We refer to these domains as ground. We then extend the notion of domain to capture the generalization process by also assuming the existence of a set of (generalized) domains containing generalized values and of a mapping between each domain and domains generalization of it. There are several types of recodings for generalization. The recoding that preserves the most information is local recoding [36]. In local recoding, it first groups the tuples into buckets and then for each bucket, one replaces all values of one attribute with a generalized value. Such a recoding is local because the same attribute value may be generalized differently when they appear in different buckets. We now show that Extended slicing preserves more information than such a local recoding approach, assuming that the same tuple partition is used. We achieve this by showing that Extended slicing is better than the following enhancement of the local recoding approach. Rather than using a generalized value to replace more specific attribute values, one uses the multiset of exact values in each bucket. For example, Table 1b is a generalized table, and Table 1d is the result of using multisets of exact values rather than generalized values. For the Age attribute of the first bucket, we use the multiset of exact values {22, 22, 33, 52} rather than the generalized interval [22-52]. The multiset of exact values provides more information about the distribution of values in each attribute than the generalized interval. Therefore, using multisets of exact values preserves more information than generalization.

The k-anonymity protection model

K-anonymity has been proposed to reduce the risk of this type of attack [12, 13, 15]. The primary goal of kanonymization is to protect the privacy of the individuals to whom the data pertains. However, subject to this constraint, it is important that the released data remain as “useful” as possible. Numerous recoding models have been proposed in the literature for k-anonymization [8, 9, 13, 17, 10], and often the “quality” of the published data is dictated by the model that is used. Basic protection models termed null-map, k-map and wrong-map which provide protection by ensuring that released information map to no, k or incorrect entities, respectively [25]. To determine how many individuals each released tuple actually matches requires combining the released data with externally available data and analyzing other possible attacks. Making such a determination directly can be an extremely difficult task for the data holder who releases information.

Bucketization

We first need to carefully describe how the published data is constructed from the underlying table if we are to correctly interpret this published data. That is, we need to specify a sanitization method. We briefly describe two popular sanitization methods.

LIMITATIONS OF EXISTING SYSTEMS

Generalization for k-anonymity suffers from the curse of dimensionality. In order for generalization to be effective, records in the same bucket must be close to each other so that generalizing the records would not lose too much information. However, in high dimensional data, most data points have similar distances with each other, forcing a great amount of generalization to satisfy k-anonymity even for relatively small k’s. In order to perform data analysis or data mining tasks on the generalized table, the data analyst has to make the uniform distribution assumption that every value in a generalized interval set is equally possible, as no other distribution assumption can be justified. This significantly reduces the data utility of the generalized data

MOTIVATION OF EXTENDED SLICING

We present a novel technique called Extended Slicing for privacy-preserving data publishing. The contributions are we introduce Extended Slicing as a new technique for privacy preserving data publishing. Extended Slicing has several advantages when compared with generalization and bucketization. It preserves better data utility than generalization. It preserve more attribute correlations with the SAs than bucketization. It can also handle high-dimensional data and data without a clear separation of QIs and. It can be effectively used for preventing attribute disclosure, based on the privacy requirement of ‘l-diversity. We introduce a notion called ‘l- diverse slicing, which ensures that the adversary cannot learn the sensitive value of any individual with a probability greater than 1/l. we develop an efficient algorithm for computing the sliced table that satisfies ‘l-diversity. Our algorithm partitions attributes into columns, applies column generalization, and partitions tuples into buckets. Attributes that are highly correlated are in the same column.

study tips