23-08-2012, 05:10 PM
Apriori algorithm
apriori.ppt (Size: 226 KB / Downloads: 159)
Association rules
Techniques for data mining and knowledge discovery in databases
Five important algorithms in the
development of association rules (Yilmaz
et al., 2003):
AIS algorithm 1993
SETM algorithm 1995
Apriori, AprioriTid and AprioriHybrid 1994
Apriori algorithm
Developed by Agrawal and Srikant 1994
Innovative way to find association rules on large scale, allowing implication outcomes that consist of more than one item
Based on minimum support threshold (already used in AIS algorithm)
Three versions:
Apriori (basic version) faster in first iterations
AprioriTid faster in later iteratons
AprioriHybrid can change from Apriori to AprioriTid after first iterations
Limitations of Apriori algorithm
Needs several iterations of the data
Uses a uniform minimum support threshold
Difficulties to find rarely occuring events
Alternative methods (other than appriori) can address this by using a non-uniform minimum support thresold
Some competing alternative approaches focus on partition and sampling
Phases of knowledge discovery
data selection
data cleansing
data enrichment (integration with additional resources)
data transformation or encoding
data mining
reporting and display (visualization) of the discovered knowledge
(Elmasri and Navathe, 2000)
Application of data mining
Data mining can typically be used with transactional databases (for ex. in shopping cart analysis)
Aim can be to build association rules about the shopping events
Based on item sets, such as
{milk, cocoa powder} 2-itemset
{milk, corn flakes, bread} 3-itemset
Association rules
Items that occur often together can be associated to each other
These together occuring items form a frequent itemset
Conclusions based on the frequent itemsets form association rules
For ex. {milk, cocoa powder} can bring a rule cocoa powder milk
Subjective measures
Often based on earlier user experiences and beliefs
Unexpectedness: rules are interesting if they are unknown or contradict the existing knowledge (or expectations).
Actionability: rules are interesting if users can get advantage by using them
Weak and strong beliefs
Simplicity
Focus on generating simple association rules
Length of rule can be limited by user-defined threshold
With smaller itemsets the interpretation of rules is more intuitive
Unfortunately this can increase the amount of rules too much
Quantitative values can be quantized (for ex. age groups)
apriori.ppt (Size: 226 KB / Downloads: 159)
Association rules
Techniques for data mining and knowledge discovery in databases
Five important algorithms in the
development of association rules (Yilmaz
et al., 2003):
AIS algorithm 1993
SETM algorithm 1995
Apriori, AprioriTid and AprioriHybrid 1994
Apriori algorithm
Developed by Agrawal and Srikant 1994
Innovative way to find association rules on large scale, allowing implication outcomes that consist of more than one item
Based on minimum support threshold (already used in AIS algorithm)
Three versions:
Apriori (basic version) faster in first iterations
AprioriTid faster in later iteratons
AprioriHybrid can change from Apriori to AprioriTid after first iterations
Limitations of Apriori algorithm
Needs several iterations of the data
Uses a uniform minimum support threshold
Difficulties to find rarely occuring events
Alternative methods (other than appriori) can address this by using a non-uniform minimum support thresold
Some competing alternative approaches focus on partition and sampling
Phases of knowledge discovery
data selection
data cleansing
data enrichment (integration with additional resources)
data transformation or encoding
data mining
reporting and display (visualization) of the discovered knowledge
(Elmasri and Navathe, 2000)
Application of data mining
Data mining can typically be used with transactional databases (for ex. in shopping cart analysis)
Aim can be to build association rules about the shopping events
Based on item sets, such as
{milk, cocoa powder} 2-itemset
{milk, corn flakes, bread} 3-itemset
Association rules
Items that occur often together can be associated to each other
These together occuring items form a frequent itemset
Conclusions based on the frequent itemsets form association rules
For ex. {milk, cocoa powder} can bring a rule cocoa powder milk
Subjective measures
Often based on earlier user experiences and beliefs
Unexpectedness: rules are interesting if they are unknown or contradict the existing knowledge (or expectations).
Actionability: rules are interesting if users can get advantage by using them
Weak and strong beliefs
Simplicity
Focus on generating simple association rules
Length of rule can be limited by user-defined threshold
With smaller itemsets the interpretation of rules is more intuitive
Unfortunately this can increase the amount of rules too much
Quantitative values can be quantized (for ex. age groups)