18-01-2013, 12:40 PM
A New Approach on Rare Association Rule Mining
1A New Approach.pdf (Size: 198 KB / Downloads: 40)
ABSTRACT
Association rule mining is the process of finding some relations
among the attributes/attribute values of huge database
based on support value. Most existing association mining techniques
are developed to generate frequent rules based on frequent
itemsets generated on market basket datasets. A common
property of these techniques is that they extract frequent itemsets
and prune the infrequent itemsets. However, such infrequent or
rare itemsets and consequently the rare rules may provide valuable
information. So, many applications demand to mine such
rare association rules which have low support but higher confidence.
This paper presents a method to generate both frequent
and rare itemsets and consequently the rules. The effectiveness
of the rules has been validated over several real life datasets.
INTRODUCTION
Association mining is one of the important tasks of data mining
intended towards decision support. Basically it is the process
of finding some relations among the attributes/attribute values
of huge database. Inside the huge collection of data stored in a
database, some kind of relationships among the various attributes
may exist. Discovering such relationships may help in some decision
making process significantly. However, extraction of such
relationship from large dataset is not a trivial task. The process
of extracting these relationships is termed as association rule
mining and can be represented in IF-THEN form. Association
rule mining problem was introduced by Agrawal[1] that works
on a binary dataset, termed as market basket, where each attribute
is termed as an item. In the last two decades several novel
works have been evolved to handle the association rule mining
problem[2][8][15]. All these algorithms are based on a supportconfidence
framework. They work in two phases, namely frequent
itemset generation and rule generation. The first phase
explains the concept of support to derive the frequent itemsets.
Support of an itemset can be defined as the proportion of transactions
in the dataset which contain the itemset.
RARE ASSOCIATION RULE GENERATION
In the past couple of years several novel
algorithms[1][3][7][11][12][18] have been developed to
extract strong association rules, fulfilling the minimum support
and minimum confidence requirements. These algorithms are
mainly focused on the frequent itemsets generation phase and
capable of finding only the frequent itemsets from the dataset,
and it drops the infrequent itemsets. But, the main goal here is to
generate the rare rules which might give valuable information.
Some algorithms were developed for extracting rare itemsets
and/or frequent itemsets from a dataset. In this section some
of those algorithms are discussed. To describe the algorithms,
symbols and notations used are given in Table 1.
Complexity Analysis
The overall complexity of the algorithm is basically depends on
the size of the dataset and the user defined parameter Minsup.
For any dataset of n attributes and m records, the approximate
complexity is O(nk+1 m), where k is the maximum length
itemset.
Proof of Correctness
In this section we establish that NBD-Apriori-FR is correct in
generating both frequent and rare itemsets. Following lemma
provides the proof of correctness of our NBD-Apriori-FR.
Lemma 1: NBD-Apriori-FR is correct i.e the itemsets generated
by the algorithm are either frequent or rare with reference to the
user defined threshold Minsup.
Proof: The correctness of NBD-Apriori-FR can be established
from the fact that it generates the final list of frequent and rare
itemsets based on three candidate lists i.e zero, frequent and rare
with reference a user defined threshold i.e Minsup. An itemset is
put in the final list of frequent and rare itemset iff it satisfies the
Minsup condition, hence the proof.
CONCLUSION AND FUTURE WORK
Several frequent and rare association rule mining techniques
have been studied and reported in this paper. A general comparison
among these techniques also has been reported to highlight
their pros and cons. To address the limitations of these
techniques, an effective Apriori based frequent and rare itemset
finding technique has been presented. The superiority of the
technique has been established in terms of seven datasets while
comparing with its other competing algorithms. Finally, to address
the limitation of any support-confidence based single objective
rare rule mining method, this paper introduces a multiobjective
GA based rare rule generation method. The effectiveness
of the method has been shown over three publicly available
UCI datasets. Work is going on for further enhancement of
our MOGA (Multi-Objective GA) based rare association mining
method by considering measures other than confidence, interestingness
and comprehensiveness, and for datasets with large number
of high-dimensional instances to help finding only meaningful
rare association rules. Attempt is also going on to explore
the possibility of applying both the methods in network anomaly
detection towards identification of known as well as unknown
attacks.