04-02-2013, 10:19 AM
Enhanced Mining of Association Rules from Data Cubes
1Enhanced Mining.pdf (Size: 316.45 KB / Downloads: 40)
ABSTRACT
On-line analytical processing (OLAP) provides tools to ex-
plore and navigate into data cubes in order to extract inter-
esting information. Nevertheless, OLAP is not capable of
explaining relationships that could exist in a data cube. As-
sociation rules are one kind of data mining techniques which
finds associations among data. In this paper, we propose
a framework for mining inter-dimensional association rules
from data cubes according to a sum-based aggregate measure
more general than simple frequencies provided by the tradi-
tional COUNT measure. Our mining process is guided by a
meta-rule context driven by analysis objectives and exploits
aggregate measures to revisit the definition of support and
confidence. We also evaluate the interestingness of mined as-
sociation rules according to Lift and Loevinger criteria and
propose an efficient algorithm for mining inter-dimensional
association rules directly from a multidimensional data.
INTRODUCTION
Data warehousing and OLAP technology has known an
important progress since the 90s. In addition, with effi-
cient techniques developed for computing data cubes, OLAP
users have become widely able to explore multidimensional
data, navigate through hierarchical levels of dimensions, and
therefore extract interesting information according to mul-
tiple levels of granularity in data. Nevertheless, the OLAP
technology is quite limited to an exploratory task and does
not provide automatic tools to explain relationships and as-
sociations within data. For example, we can note from a
data cube that sales of sleeping bags are particulary high
in a given city. Nevertheless, current OLAP tools are not
able to automatically explain the causes of this particular
fact. Users are usually supposed to explore the data cube
according to multiple dimensions in order to manually find
an explanation for a given phenomenon (e.g., high sales).
For instance, one possible explanation of the previous ex-
ample consists in associating sales of sleeping bags with the
summer season and young tourist costumers.
RELATED WORK
Association rule mining was first introduced by Agrawal
et al. [1] who were motivated by market basket analysis and
designed a framework for extracting rules from a set of trans-
actions related to items bought by customers. They also pro-
posed the Apriori algorithm that discovers large (frequent)
itemsets satisfying the minimum support and association
rules based on the minimum confidence. Since then, many
developments have been performed in order to handle var-
ious types and structures of data. For instance, the prob-
lem of mining quantitative association rules from large rela-
tional tables was first addressed in [17]. In [16], Srikant and
Agrawal proposed to mine association rules for categorical
data. In [6], Han and Fu introduced multilevel association
rules which cope with multilevel data abstractions.
To the best of our knowledge, Kamber et al. [8] were the
first who addressed the issue of mining association rules
from multidimensional data. They introduced the concept
of metarule-guided mining which consists in using rule tem-
plates defined by users in order to guide the mining pro-
cess. This mining process considers precomputed data cubes
and dynamic construction of relevant data cubes. Inter-
dimensional association rules with distinct predicates are
mined from single levels of dimensions. Support and con-
fidence are computed according to the COUNT measure.
Zhu considers the problem of mining association rules from
data cubes under three groups: inter-dimensional, intra-
dimensional, and hybrid association mining [19].
THE PROPOSED FRAMEWORK
As mentioned earlier, our proposal consists in (i) exploit-
ing metarule templates to mine rules from a limited subset
of a data cube, (ii) revisiting the definition of support and
confidence based on the measure values, (iii) using advanced
criteria to evaluate interestingness of mined associations,
and (iv) proposing an Apriori-based algorithm for mining
multidimensional data.
Interdimensionalmetarules
As in [15], we consider two distinct subsets of dimensions
in the data cube C: (i) DC ⊂ D is a subset of p context dimen-
sions. A sub-cube on C according to DC defines the context
of the mining process; and (ii) DA is a subset of analysis
dimensions from which predicates of an inter-dimensional
meta-rule are selected.
IMPLEMENTATIONANDALGORITHMS
We developed a Web application to mine association rules
from data cubes according to our proposal. This applica-
tion is a Mining Association Rule Module that runs on a
Client/Server platform, called MiningCubes, which already
includes previous work on mining multidimensional data [12,
13]. The platform is equipped with a data loader com-
ponent that enables connection to multidimensional data
cubes stored in the Analysis Services of MS SQL Server
2000. By employing MDX (MultiDimensional eXpressions)
queries, this component loads information about the struc-
ture (labels of dimensions, hierarchical levels and measures)
and the content of a user selected data cube. The Min-
ing Association Rule Module allows the definition of re-
quired parameters to run an association rule mining pro-
cess.
CONCLUSION AND PERSPECTIVES
In this paper, we establish a general framework for min-
ing inter-dimensional association rules from data cubes. We
use inter-dimensional meta-rule which allows users to limit
the mining process to a specific context defined by a partic-
ular portion in the mined data cube. In our proposal, we
provide a general computation of support and confidence of
association rules that can be based on any measure from the
data cube. This issue is quite interesting since it expresses
associations which consider wide analysis objectives and do
not restrict users’ analysis to associations only driven by the
traditional COUNT measure.