26-06-2013, 04:56 PM
Mining Closed Sequences with Constraint Based on BIDE Algorithm Report
Mining Closed Sequences.docx (Size: 16.35 KB / Downloads: 15)
Abstract
Mining sequential pattern is one of the common data mining tasks for many real-life applications. Previous existing algorithm such as CAMLS(Constraint-based Apriority Algorithm for Mining Long Sequences) mines the complete set of frequent sequences (Long) satisfying a min-sup threshold in a sequence. However, mining long sequences will generate an explosive number of frequent sequences, which is prohibitively costly in both run time and space storage. In this paper, we propose to improve CAMLS algorithm to produce only for closed sequences. Instead of mining full set of sequences, we plan to mine only short(closed) sequences .i.e., those containing, no super sequences with same support. Our motivation is to mine closed sequences from long sequences using BIDE algorithm with improved CAMLS algorithm and make the pruning strategy even more efficient. BIDE is an efficient algorithm for mining closed sequences which works under without candidate-maintenance and test paradigm.
Existing System:
Mining sequential patterns is a key objective in the field of data mining due to its wide range of applications. Given a database of sequences, the challenge is to identify patterns which appear frequently in different sequences. Well known algorithms such as . CAMLS, Constraint-based Apriori Mining of Long Sequences, an efficient algorithm for mining long sequential patterns under constraints have proved to be efficient, however these algorithms do not perform well when mining databases because CAMLS is based on the apriori property which employ an iterative process of candidate-generation followed by frequency-testing.
Disadvantages:
• Candidate set generation is still costly, especially when there exist a large number of patterns and/or long patterns.
• The support threshold is low or the pattern becomes lengthen the runtime and space usage becomes costly.
Proposed System:
So in this project we presented an improved CAMLS algorithm to produce only closed sequences and to make the pruning strategy even more efficient. BIDE is another efficient algorithm which is used to find closed sequences from long sequences.
We propose frequent-pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth.
Efficiency of mining is achieved with three techniques:
(1) A large database is compressed into a condensed, smaller data structure, FP-tree which avoids costly, repeated database scans
(2) Our FP-tree-based mining adopts a pattern-fragment growth method to avoid the costly generation of a large number of candidate sets
(3)A partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space.