13-11-2012, 01:59 PM
Privacy Preserving Decision Tree Learning Using Unrealized Data Sets
Abstract
Privacy preservation is important for machine learning
and data mining, but measures designed to protect
private information often result in a trade-off: reduced
utility of the training samples. This paper introduces a
privacy preserving approach that can be applied to
decision tree learning, without concomitant loss of
accuracy. It describes an approach to the preservation of
the privacy of collected data samples in cases where
information from the sample database has been partially
lost. This approach converts the original sample data
sets into a group of unreal data sets, from which the
original samples cannot be reconstructed without the
entire group of unreal data sets. Meanwhile, an accurate
decision tree can be built directly from those unreal data
sets. This novel approach can be applied directly to the
data storage as soon as the first sample is collected. The
approach is compatible with other privacy preserving
approaches, such as cryptography, for extra protection.