14-02-2013, 02:59 PM
APRIORI Algorithm
1APRIORI.pdf (Size: 173.88 KB / Downloads: 91)
The Apriori Algorithm: Basics
The Apriori Algorithmis an influential algorithm for mining frequent itemsets for boolean association rules.
Key Concepts :
•Frequent Itemsets: The sets of item which has minimum support (denoted by Lifor ith-Itemset).
•Apriori Property: Any subset of frequent itemset must be frequent.
•Join Operation: To find Lk, a set of candidate k-itemsets is generated by joining Lk-1with itself.
The Apriori Algorithm in a Nutshell
•Find the frequent itemsets: the sets of items that have minimum support
–A subset of a frequent itemset must also be a frequent itemset
•i.e., if {AB} isa frequent itemset, both {A} and {B} should be a frequent itemset
–Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset)
•Use the frequent itemsets to generate association rules.
The Apriori Algorithm: Example
•Consider a database, D , consisting of 9 transactions.
•Suppose min. support count required is 2 (i.e. min_sup = 2/9 = 22 % )
•Let minimum confidence required is 70%.
•We have to first find out the frequent itemset using Apriori algorithm.
•Then, Association rules will be generated using min. support & min. confidence.
Generating 2-itemset Frequent Pattern
•To discover the set of frequent 2-itemsets, L2, the algorithm uses L1 Join L1to generate a candidate set of 2-itemsets, C2.
•Next, the transactions in D are scanned and the support count for each candidate itemset in C2is accumulated (as shown in the middle table).
•The set of frequent 2-itemsets, L2, is then determined, consisting of those candidate 2-itemsets in C2having minimum support.
•Note:We haven’t used Apriori Property yet.
FP-Growth Method: Construction of FP-Tree
•First, create the rootof the tree, labeled with “null”.
•Scan the database D a second time. (First time we scanned it to create 1-itemset and then L).
•The items in each transaction are processed in L order (i.e. sorted order).
•A branch is created for each transactionwith items having their support count separated by colon.
•Whenever the same node is encountered in another transaction, wejust increment the support count of the common node or Prefix.
•To facilitate tree traversal, an item header tableis built so that each item points to its occurrences in the tree via a chain of node-links.
•Now, The problem of mining frequent patterns in database is transformed to that of mining the FP-Tree.