10-10-2012, 03:57 PM
Automated support specification for efficient mining of interesting association rules
Automated support.pdf (Size: 256.55 KB / Downloads: 22)
Abstract.
In recent years, the weakness of the canonical support-confidence framework for associations mining has
been widely studied. One of the difficulties in applying association rules mining is the setting of support
constraint. A high-support constraint avoids the combinatorial explosion in discovering frequent itemsets,
but at the expense of missing interesting patterns of low support. Instead of seeking the way for setting the
appropriate support constraint, all current approaches leave the users to be in charge of the support
setting, which, however, puts the users in a dilemma. This paper is an effort to answer this long-standing
open question. According to the notion of confidence and lift measures, we propose an automatic support
specification for efficiently mining high-confidence and positive lift associations without consulting the
users. Experimental results show that the proposed method not only is good at discovering high-confidence
and positive lift associations, but also is effective in reducing the spurious frequent itemsets.
Introduction
Mining association rules from a large database of business data, such as transaction records, has been a hot topic
within the area of data mining. This problem is motivated by applications known as market basket analysis in
finding relationships between items purchased by customers, that is, what kinds of products tend to be purchased
together [1]. An association rule is an expression of the form A B, where A, B are sets of items. Such a rule
reveals that transactions in the database containing items in A tend to contain items in B, and the conditional
probability, measured as the fraction of transactions containing A also containing B, i.e., P(B|A) = P(A B) /
P(A), is called the confidence (conf) of the rule. The support (sup) of the rule is the fraction of the transactions
that contain all items both in A and B, i.e., sup(A B) = P(A B). For an association rule to hold, the support
and the confidence of the rule should satisfy a user-specified minimum support called minsup and minimum
confidence called minconf, respectively.
One of the difficulties in applying association rules mining is the setting of support constraint. A high-support
constraint avoids the combinatorial explosion in discovering frequent itemsets, but at the expense of missing
interesting patterns of low support. However, most rules with high support are obvious and well-known, and it is
the rules with low support that provide interesting new insight, such as deviations or exceptions.
Instead of seeking the way for setting the appropriate support constraint, all current approaches leave the users to
be in charge of the support setting, which, however, puts the users in a dilemma: how to specify the most
appropriate support constraint, either uniform or non-uniform, to discover interesting patterns without suffering
from combinatorial explosion and missing some low-support but perceptive rules. The best one can do is either
setting the support at the lowest value ever specified or performing a consecutive sequence of mining processes
with various constraints to extract the right patterns.
Problem specification and related work
In the past few years, there has been work on challenging the canonical support-confidence framework for
associations mining. These efforts can be categorized into two paradigms: extending the constant support
constraint and/or seeking substitutes for confidence measure.
The uniform support constraint was first argued in [4] while generalizing the association model into mutiple-level
associations on account of item hierarchy. In [4], Han and Fu extended the uniform support constraint to a form
of level-by-level, decreasing assignment. That is, items at the same level receive the same minimum support, and
higher level items have larger support constraint. This level-wise support specification accounts for their
progressive mining approach: An Apriori-like algorithm is performed progressively from the top level to the
bottom, and stops at the very level when no frequent itemset is generated.
Another form of association rules mining with non-uniform minimum supports was proposed by Liu et al [5].
Their method allows the users to specify different minimum supports to different items, and the support
constraint of an itemset is defined as the lowest minimum item support among the items in the itemset. The
motivation is that the supports of items are non-uniform by nature, and high profit items (e.g., TV) usually occur
less frequently than low value items (e.g., toothpaste). The multi-supported model was then extended to
generalized associations with taxonomy information in [6]. The problem remains tangling.
Interesting association rules and support specification
Interesting association rules
As concluded from the previous discussion, the primary problems of support-confidence framework are poor
predictive ability and uniform support constraint. Although substantial work has provided various methods to
alleviate these problems, such as adding the lift or conviction measure, and using non-uniform support constraint,
there is still no guideline for users in the support specification. Either the user has to, at the cost of computational
efficiency, set the support constraint low enough so as not to lose any interesting rules or risk missing new insight
patterns.
Our view for solving this problem is to discharge the end-users from specifying the support constraint. Besides,
in light of the previous researches, we adopt the lift measure and non-uniform support constraint to eliminate the
deficiency of support-confidence framework. That is, we confine ourselves to seek association rules with highconfidence
and positive lift, without the need of user-specified support constraint.
Conclusions
We have investigated in this paper the problem of mining interesting association rules without the user-specified
support constraint. We proposed a confidence-lift-based support constraint which can be automatically derived
from the item support. Empirical evaluation showed that the proposed support specification is good at
discovering the high-confidence and positive lift associations, and is effective in reducing the spurious frequent
itemsets.
Finally, we should point out that the proposed support specification still cannot find all interesting association
rules. In fact, it may miss some interesting rules composed of more than two items because the specification is
derived from frequent 2-itemsets