Fast Algorithms for Mining Association Rules pdf

**project girl** · 15-12-2012, 01:57 PM

Fast Algorithms for Mining Association Rules

.pdf

1Fast Algorithms.pdf (Size: 354.18 KB / Downloads: 27)

Abstract

We consider the problem of discovering association rules between items in a large database
of sales transactions. We present two new algorithms for solving this problem that are fundamentally
dierent from the known algorithms. Experiments with synthetic as well as real-life
data show that these algorithms outperform the known algorithms by factors ranging from three
for small problems to more than an order of magnitude for large problems. We also show how
the best features of the two proposed algorithms can be combined into a hybrid algorithm,
called AprioriHybrid. Scale-up experiments show that AprioriHybrid scales linearly with the
number of transactions. AprioriHybrid also has excellent scale-up properties with respect to the
transaction size and the number of items in the database.

Introduction

Database mining is motivated by the decision support problem faced by most large retail organizations
[S+93]. Progress in bar-code technology has made it possible for retail organizations to
collect and store massive amounts of sales data, referred to as the basket data. A record in such
data typically consists of the transaction date and the items bought in the transaction. Successful
organizations view such databases as important pieces of the marketing infrastructure [Ass92].
They are interested in instituting information-driven marketing processes, managed by database
technology, that enable marketers to develop and implement customized marketing programs and
strategies [Ass90].
The problem of mining association rules over basket data was introduced in [AIS93b]. An
example of such a rule might be that 98% of customers that purchase tires and auto accessories also
get automotive services done. Finding all such rules is valuable for cross-marketing and attached
mailing applications. Other applications include catalog design, add-on sales, store layout, and
customer segmentation based on buying patterns. The databases involved in these applications are
very large. It is imperative, therefore, to have fast algorithms for this task.

Discovering Large Itemsets

Algorithms for discovering large itemsets make multiple passes over the data. In the rst pass, we
count the support of individual items and determine which of them are large, i.e. have minimum
support. In each subsequent pass, we start with a seed set of itemsets found to be large in the
previous pass. We use this seed set for generating new potentially large itemsets, called candidate
itemsets, and count the actual support for these candidate itemsets during the pass over the data.
At the end of the pass, we determine which of the candidate itemsets are actually large, and they
become the seed for the next pass. This process continues until no new large itemsets are found.
The Apriori and AprioriTid algorithms we propose dier fundamentally from the AIS [AIS93b]
and SETM [HS93] algorithms in terms of which candidate itemsets are counted in a pass and
in the way that those candidates are generated. In both the AIS and SETM algorithms (see
Sections 4.1 and 4.2 for a review), candidate itemsets are generated on-the-
y during the pass
as data is being read. Specically, after reading a transaction, it is determined which of the
itemsets found large in the previous pass are present in the transaction. New candidate itemsets
are generated by extending these large itemsets with other items in the transaction. However, as
we will see, the disadvantage is that this results in unnecessarily generating and counting too many
candidate itemsets that turn out to be small.

Algorithm Apriori

Figure 1 gives the Apriori algorithm. The rst pass of the algorithm simply counts item occurrences
to determine the large 1-itemsets. A subsequent pass, say pass k, consists of two phases. First, the
large itemsets Lk

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Software Crisis pdf	study tips	1	2,117	21-09-2017, 04:31 PM Last Post: jaseela123
	Design and Analysis Of Algorithms : Seminar Report and PPT	seminar projects maker	1	1,315	21-09-2017, 12:04 PM Last Post: jaseela123
	Data Mining: What is Data Mining? Report	project girl	1	2,262	21-09-2017, 11:47 AM Last Post: jaseela123
	HOW EMAIL WORKS pdf	project girl	1	3,067	20-09-2017, 11:39 AM Last Post: jaseela123
	Cyber crime detection, investigation and prosecution pdf	seminar projects maker	1	958	20-09-2017, 11:31 AM Last Post: jaseela123
	Review: Context Aware Tools for Smart Home Development pdf	study tips	1	1,227	20-09-2017, 11:22 AM Last Post: jaseela123
	Getting Started with the MAXQ1103 Evaluation Kit and the CrossWorks Compiler pdf	project girl	1	969	15-09-2017, 03:11 PM Last Post: jaseela123
	Wireless Application Protocol (WAP) pdf	project girl	1	1,531	15-09-2017, 02:42 PM Last Post: jaseela123
	MAC Protocol for Reliable Multicast over Multi-Hop Wireless Ad Hoc Networks pdf	study tips	1	1,029	15-09-2017, 12:39 PM Last Post: jaseela123
	Wireless Automotive Communications pdf	seminar projects maker	1	637	14-09-2017, 01:27 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.