04-06-2012, 11:13 AM
GRAPH MINING
Graph Mining.Pdf (Size: 895.71 KB / Downloads: 33)
INTRODUCTION
Informally, a graph is set of nodes, pairs of which might be connected by edges. In
a wide array of disciplines, data can be intuitively cast into this format. For example,
computer networks consist of routers/computers (nodes) and the links (edges) between them.
Social networks consist of individuals and their interconnections (which could be business
relationships or kinship or trust, etc.) Protein interaction networks link proteins which must
work together to perform some particular biological function. Ecological food webs link
species with predator-prey relationships. In these and many other fields, graphs are
seemingly ubiquitous.
Detection of abnormal subgraphs/edges/nodes:
Abnormalities should deviate from the normal patterns so understanding the
patterns of naturally occurring graphs is a prerequisite for detection of such outliers.
Simulation studies:
Algorithms meant for large real-world graphs can be tested on synthetic graphs
which look like the original graphs. For example, in order to test the next-generation
Internet protocol, we would like to simulate it on a graph that is similar to what the
Internet will look like a few years into the future.
Realism of samples:
We might want to build a small sample graph that is similar to a given large
graph. This smaller graph needs to match the patterns of the large graph to be realistic.
Graph compression:
Graph patterns represent regularities in the data. Such regularities can be used to
better compress the data.
Thus, we need to detect patterns in graphs and then generate synthetic graphs matching
such patterns automatically. So our main focus of study is on Graph patterns and Graph
generators.
GRAPH PATTERNS
While there are many differences between graphs, some patterns show up regularly.
Work has focused on finding several such patterns, which together characterize naturally
occurring graphs. The main ones appear to be:
i. power laws,
ii. small diameters, and
iii. community effect
Power Laws
While the Gaussian distribution is common in nature, there are many cases where the
probability of events far to the right of the mean is significantly higher than in Gaussians. In
the Internet, for example, most routers have a very low degree (perhaps home routers), while
a few routers have an extremely high degree (perhaps the core routers of the Internet
backbone). Power-law distributions attempt to model this.