Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

PRIVACY PRESERVING CLASSIFICATION DATA MINING BASED ON RANDOM PERTURBATION

[attachment=66539]

ABSTRACT

Privacy Preserving classified data mining is the main type of the privacy protection data mining. The key point of the Privacy Preserving classified data mining is how to transform the primitive real data and then structure decision tree based on transformed data set. This paper proposes a kind of mining method which is based on Random perturbation matrix

Scenario (Information Sharing)

A data owner wants to release a person-specific data table to another party (or the public) for the purpose of classification analysis without scarifying the privacy of the individuals in the released data.

Condensation approach

It constructs constrained clusters in the data set & then generates pseudo-data from the statistics of these clusters.
The use of pseudo-data no longer necessitates the redesign of data mining algorithms.
This approach uses a methodology which condense the data into multiple groups of predefined size.
A greater amount of information is lost because of the condensation of large number of records into a single statistical group entity.

Data Preprocessing

In order to make this method applicable for different data types in the data set, It is necessary to preprocess the original data set.

The data Preprocessing is divided into discrete data, attribute coding, data sets coded data set

First we will convert the integer attributes to form the discrete data set using the average region method as follows

A is continuous attributes, n is the number of discrete values, length is the length of the discrete interval.
A(max) - A(min) / n = length

Using this length, we will disperse the continuous data of the numerical fields in the original data set.

Data Perturbation

The data perturbation is that the value of each property is transformed into other value of the property domain by given probability. The algorithm works as follows

Input: Data set records |s|
The disturbance R(A) of attribute A

Output: Attributes A’s data field V[n] after perturbation

Here |s| represents the number of records of the attribute A of the original data set.

CONCLUSION

The privacy preserving data mining method mainly
depends on the privacy protection metric and the accuracy of mining results, the method of most literatures sacrifice the privacy in exchange of a high accuracy or sacrifice accuracy for a high privacy protection metric.

seminar code