16-03-2012, 02:35 PM
DATA LEAKAGE DETECTION
data leakage detection.pdf (Size: 957.62 KB / Downloads: 48)
INTRODUCTION
IN the course of doing business, sometimes sensitive data
must be handed over to supposedly trusted third parties.
For example, a hospital may give patient records to
researchers who will devise new treatments. Similarly, a
company may have partnerships with other companies that
require sharing customer data. Another enterprise may
outsource its data processing, so data must be given to
various other companies. We call the owner of the data the
distributor and the supposedly trusted third parties the
agents. Our goal is to detect when the distributor’s sensitive
data have been leaked by agents, and if possible to identify
the agent that leaked the data.
RELATED WORK
The guilt detection approach we present is related to the
data provenance problem [3]: tracing the lineage of
S objects implies essentially the detection of the guilty
agents. Tutorial [4] provides a good overview on the
research conducted in this field. Suggested solutions are
domain specific, such as lineage tracing for data warehouses
[5], and assume some prior knowledge on the way a
data view is created out of data sources. Our problem
formulation with objects and sets is more general and
simplifies lineage tracing, since we do not consider any data
transformation from Ri sets to S.
AGENT GUILT MODEL
To compute this PrfGijSg, we need an estimate for the
probability that values in S can be “guessed” by the target.
For instance, say that some of the objects in S are e-mails of
individuals. We can conduct an experiment and ask a
person with approximately the expertise and resources of
the target to find the e-mail of, say, 100 individuals. If this
person can find, say, 90 e-mails, then we can reasonably
guess that the probability of finding one e-mail is 0.9. On
the other hand, if the objects in question are bank account
numbers, the person may only discover, say, 20, leading to
an estimate of 0.2. We call this estimate pt, the probability
that object t can be guessed by the target.
GUILT MODEL ANALYSIS
In order to see how our model parameters interact and to
check if the interactions match our intuition, in this
section, we study two simple scenarios. In each scenario,
we have a target that has obtained all the distributor’s
objects, i.e., T ¼ S.
Impact of Probability p
In our first scenario, T contains 16 objects: all of them are
given to agent U1 and only eight are given to a second
agent U2. We calculate the probabilities PrfG1jSg and
PrfG2jSg for p in the range [0, 1] and we present the results
in Fig. 1a. The dashed line shows PrfG1jSg and the solid
line shows PrfG2jSg.
data leakage detection.pdf (Size: 957.62 KB / Downloads: 48)
INTRODUCTION
IN the course of doing business, sometimes sensitive data
must be handed over to supposedly trusted third parties.
For example, a hospital may give patient records to
researchers who will devise new treatments. Similarly, a
company may have partnerships with other companies that
require sharing customer data. Another enterprise may
outsource its data processing, so data must be given to
various other companies. We call the owner of the data the
distributor and the supposedly trusted third parties the
agents. Our goal is to detect when the distributor’s sensitive
data have been leaked by agents, and if possible to identify
the agent that leaked the data.
RELATED WORK
The guilt detection approach we present is related to the
data provenance problem [3]: tracing the lineage of
S objects implies essentially the detection of the guilty
agents. Tutorial [4] provides a good overview on the
research conducted in this field. Suggested solutions are
domain specific, such as lineage tracing for data warehouses
[5], and assume some prior knowledge on the way a
data view is created out of data sources. Our problem
formulation with objects and sets is more general and
simplifies lineage tracing, since we do not consider any data
transformation from Ri sets to S.
AGENT GUILT MODEL
To compute this PrfGijSg, we need an estimate for the
probability that values in S can be “guessed” by the target.
For instance, say that some of the objects in S are e-mails of
individuals. We can conduct an experiment and ask a
person with approximately the expertise and resources of
the target to find the e-mail of, say, 100 individuals. If this
person can find, say, 90 e-mails, then we can reasonably
guess that the probability of finding one e-mail is 0.9. On
the other hand, if the objects in question are bank account
numbers, the person may only discover, say, 20, leading to
an estimate of 0.2. We call this estimate pt, the probability
that object t can be guessed by the target.
GUILT MODEL ANALYSIS
In order to see how our model parameters interact and to
check if the interactions match our intuition, in this
section, we study two simple scenarios. In each scenario,
we have a target that has obtained all the distributor’s
objects, i.e., T ¼ S.
Impact of Probability p
In our first scenario, T contains 16 objects: all of them are
given to agent U1 and only eight are given to a second
agent U2. We calculate the probabilities PrfG1jSg and
PrfG2jSg for p in the range [0, 1] and we present the results
in Fig. 1a. The dashed line shows PrfG1jSg and the solid
line shows PrfG2jSg.