07-06-2013, 12:33 PM
MIME: A Framework for Interactive Visual Pattern Mining
MIME.pdf (Size: 1.21 MB / Downloads: 38)
MIME (Making Interactive Mining Easy).ppt (Size: 4.11 MB / Downloads: 21)
mime documntn.docx (Size: 1.26 MB / Downloads: 20)
ABSTRACT
We present a framework for interactive visual pattern mining.
Our system enables the user to browse through the
data and patterns easily and intuitively, using a toolbox consisting
of interestingness measures, mining algorithms and
post-processing algorithms to assist in identifying interesting
patterns. By mining interactively, we enable the user to
combine their subjective interestingness measure and background
knowledge with a wide variety of objective measures
to easily and quickly mine the most important and interesting
patterns. Basically, we enable the user to become an
essential part of the mining algorithm. Our demo1currently
applies to mining interesting itemsets and association rules,
and its extension to episodes and decision trees is ongoing.
INTRODUCTION
Data mining is an inherently iterative process; the results
of one analysis often lead to new questions, requiring more
analysis. In an ideal world, this process is streamlined. That
is, data mining is not only iterative, but also interactive: the
user can give such feedback immediately, and easily browse
the results. In traditional pattern mining, however, algorithms
typically produce large amounts of patterns, many
of which are not interesting to the user [9], and the results
are typically only given in a flat text file, making it hard to
analyze the results. By instead providing an iterative and
interactive process, the user would be able to explore and
refine the discovered patterns on the fly.
DESCRIPTION OF THE SYSTEM
We consider transactional (supermarket) databases D where
each transaction contains a number of items. A pattern is
an itemset (a group of items {A,B,C} that occur together)
or a rule (two groups of items {D,E} ! {F,G,H} where
the presence of the first group implies the presence of the
second group with a given confidence). The support s of
a pattern is the number of transactions in the database in
which the pattern is present. The frequency f is the support
s relative to the size of the dataset, denoted as |D|. In this
setting, frequent itemset mining is defined as the process of
finding all patterns in a database that have a frequency fp
higher than or equal to a user-specified threshold fu.
Most pattern mining techniques produce an amount of
output that due to size is difficult to post-process. One
could try to reduce the number of results by making the
quality thresholds more strict. Unfortunately this does not
guarantee the usefulness of the produced patterns. To this
end more is required.
IMPLEMENTATION
The tool has been implemented in Java using the Qt-
Jambi2-library for visualization and GUI-purposes.
MIME provides a framework for the exploration of patterns,
where a user can create a pattern, see the best pattern
extension, add or remove items and so obtain useful
patterns. As calculating some measures may be computationally
expensive, caching has been used to minimize the
number recalculations. Threading has been used to compute
as much as possible in the background, not slowing
down the user’s ability to further explore the dataset. The
combination of these two techniques in the tool, creates an
environment where interactivity remains high.
A plugin framework has also been used in order to make
our tool easily expandable. Plugins can be defined using the
configuration file. For the moment there is one constraint on
the plugins that can be used: it must contain command-line
parameters for an input and output file, which are specified
by position of the parameters.
ILLUSTRATIVE SCENARIO
In this section we give a demonstration of the capabilities
of our tool using the stemmed version of the ICDM
Abstracts dataset by Tijl De Bie [6]. The database contains
4976 items over 859 transactions. Each transaction represents
an abstract of a paper that was published between
2001 and 2008. We start by loading the dataset in memory
using the worksheet menu.
RELATEDWORK
A lot of work has been done comparing and evaluating different
objective interestingness measures [8, 11]. The most
important outcome is that there is no single measure that
can be used for all purposes, and even worse, for some purposes,
there exists no measure, and only subjective and semantic
criteria based on experience and background knowledge
can produce actionable results.
In our tool we have incorporated several objective interestingness
measures, but it is the combination of userknowledge
and objective measures enabling subjective interestingness
criteria to be applied.
Also in the context of Inductive Databases several interactive
constraint-based mining frameworks have been studied
[7]. Here, the user typically has the ability to specify all
kinds of constraints that are being used during the mining
process. In these systems, the visualization and adaptation
of the results has not been a major concern. Moreover,
our framework could be built on top of such an inductive
database implementation [3].
Most similar to the system presented here is the framework
for mining decision trees proposed by Ankerst et al. [1].
The user and computer work together in this system such
that in each step either the user or the computer can make
the decision for a new split. The computer also provides
extra computational power by showing the best split, lookahead
information, purity measures, etc. Our approach is
similar, but applies to frequent itemsets and association
rules, instead of decision trees.
CONCLUSIONS AND FUTUREWORK
MIME is a framework for the interactive mining, exploration,
and post-processing of patterns, and allows for easy
comparison of different algorithms and pattern collections.
It makes the user part of the mining process and so allows for
creating patterns and adapting patterns by different mining
algorithms and quality measures, as well as personal knowledge
and interest. Post-processing algorithms (clustering algorithms for instance) can be applied to the created collection
or the user can create hierarchies himself. The tool
also contains a plugin system that allows extension of the
tool with existing software.