27-06-2012, 12:58 PM
Discovering Diverse-Frequent Patterns in Transactional Databases
Discovering Diverse.pdf (Size: 150.94 KB / Downloads: 59)
Introduction
Frequent pattern mining extracts interesting associations
among the items in a transactional database. In this paper,
we propose a new interestingness measure called DiverseRank
to rank the frequent patterns by considering the categories
of items within it. We first explain the basic model
of frequent patterns and discuss the motivation for the proposed
approach.
The basic model of frequent patterns is as follows [1].
Let I = fi1; i2; ; ing be a set of items, and DB be a
database that consists of a set of transactions. Each transaction
T contains a set of items such that T I. Each transaction
is associated with an identifier, called TID. Let X I
be a set of items, referred as an itemset or a pattern. A pattern
that contains k items is a k-pattern.
Diverse-Frequent Patterns
In this section, we first explain the concept of diversefrequent
patterns. Next, we propose the interestingness
measure, DiverseRank, to rank the frequent patterns based
on the items’ categories within it. Subsequently, we present
the item-encoding technique and the diverse-frequent pattern
extraction algorithm.
Diverse Rank
The following assumptions have been made on the structure
of the concept hierarchy.
The concept hierarchy is represented as a tree of
height h (the length of the longest path from root to
a leaf node). The height of the root is 0 and the height
of leaf is h.
All the items of a transaction are at the leaf node.
All the leaf nodes in the tree are present at the same
height h. That is, we are considering only the balanced
concept hierarchies. The issue of extracting
the diverse-frequent patterns for the imbalanced hierarchies
will be investigated as a part of future work.
Level Factor
It can be noted that the frequent pattern which merges to a
few higher level items after crossing more levels should receive
more diversity value than the frequent pattern which
merges to a few higher level items in lesser number of levels.
To ensure this, appropriate weight should be added to
the MF of the pattern at each level, which is called Level
Factor (LF). The LF at a given level can be computed using
various methods. We are proposing two methods.
Experiments and Results
Since there is no existing approach to discover diversefrequent
patterns, we only carry out the experiment to extract
the diverse-frequent patterns and analyze how they
differ from the frequent patterns. The experiments were
carried out on the classical R “groceries” market basket
analysis data set [6]. The groceries data set contains 30
days of point-of-sale transaction data from a typical local
grocery outlet. The data set contains 9,835 transactions
and 169 items. The average transaction size in the data
set is 4:4. The maximum and minimum transaction size is
32 and 1 respectively.