07-08-2012, 01:04 PM
Exact Knowledge Hiding through Database Extension
Exact Knowledge.doc (Size: 32.5 KB / Downloads: 82)
Abstract:
In this paper, we propose a novel, exact border-based approach that provides an optimal solution for the hiding of sensitive frequent item sets by
1) Minimally extending the original database by a synthetically generated database part—the database extension,
2) formulating the creation of the database extension as a constraint satisfaction problem,
3) mapping the constraint satisfaction problem to an equivalent binary integer programming problem,
4) Exploiting underutilized synthetic transactions to
proportionally increase the support of nonsensitive item sets,
5) minimally relaxing the constraint satisfaction problem to provide an
approximate solution close to the optimal one when an ideal solution does not exist, and
6) Using a partitioning in the universe of the
items to increase the efficiency of the proposed hiding algorithm.
Existing System:
In data collection, processing, and analysis, along with privacy concerns regarding the misuse of the induced knowledge from this data, soon brought into existence the field of privacy preserving data mining . Simple de-identification of the data prior to its mining is insufficient to guarantee a privacy-aware outcome since intelligent analysis of the data, through inference based attacks, may reveal sensitive patterns that were unknown to the database owner before mining the data.Thus, compliance to privacy regulations requires the incorporation of advanced and sophisticated solutions.
This paper concentrates on a subfield of privacy preserving data mining, known as “knowledge hiding.” Let us suppose that we are negotiating a deal with Dedtrees Paper Company, as purchasing directors of BigMart, a large supermarket chain. They offer their products with a reduced price if we agree to provide them access to our database of customer purchases. We accept the deal and Dedtrees starts mining our data. By using an association rule mining tool, they find that people who purchase skim milk also purchase Green paper. This campaign cuts heavily into the sales of
Green paper, which increases the prices to us, based on the lower sales
Proposed System:
The presented methodology lies between the fields of frequent item set hiding and synthetic database generation (examined in the context of privacy preservation). To the best of our knowledge, apart from ongoing research work regarding an additive model for sensitive item set hiding this approach is the first to facilitate knowledge hiding through the extension of the database.
Extending the original database to accommodate knowledge hiding can be considered as a bridging between the item set hiding and the synthetic database generation approaches. In what follows, we review some of the fundamental related work in both research directions.