Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Mining Frequent Patterns from Data streams using Dynamic DP-tree. Report
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Mining Frequent Patterns from Data streams using Dynamic DP-tree.


[attachment=39731]

ABSTRACT

A DataStream is a real time continuous, ordered sequence of items. It is impossible to control the order in which items arrive, nor it is feasible to locally store a stream in reality. By short it is a rapid flow of continuous ordered data. By these specific characteristics static models and static two pass algorithms are not suitable to data streams. Data stream mining have following three challenges one every item is examined only once. Second the storage space should control even there is a large amount of data, third the mining results have to be produced as early as possible. In this paper we propose a novel method to mine the frequent items over data streams by dividing data as no of windows and mine frequent item sets over window using a very compact data structure DP-Tree and placing the every DP-Tree safely in disk space so that we can retrieve the tree structure for pruning as and when we require. More over we propose methods to dynamically construct and update the DP-Tree

INTRODUCTION

In data streams transactions arrive continuously and the volume of transactions can be potentially infinite. Formally a data stream can be defined as a sequence of transactions. D={t1,t2,t3,….} where ti is the ith arrived transaction . to process and mine the streams , different window models are often used. A window is subsequence between i-th and j-th arrived transactions, denoted by W[i,j]=(ti , ti+1, ti+2,……,tj)., i<j The order of the stream can be maintained either implicitly by arrival time and explicitly by time stamps.
There have been many algorithms developed for fast mining of frequent patterns, which can be classified into two categories. The first category, candidate generation and test approach, such as Apriori as well as many subsequent studies, are directly based on an anti-monotone Apriori property if a pattern with k items is not frequent, any of its super-pattern with (k+1) or more items can never be frequent.

PREVIOU WORK

There are different Stream window models Sliding window, Damped window, Tilted time window and weighted sliding window. Manku and matwani (2002) propose lossy counting algorithm for computing an approximate set of FIs over the entire history of a stream. The stream is divided into a sequence of buckets and each bucket consists of B = ⌈1/ϵ⌉ transactions.Each bucket is also given an ID and the ID of the current bucket, bid , is assigned as bid = ⌈N/B⌉, where N is the number of transactions currently in the stream. Lossy Counting processes a batch of transactions arriving on the stream at a time, whereeach batch contains β buckets of transactions. Giannellia , Han.. proposed a technique for computing and maintaining all the frequent item sets in the datastreams, frequent patterns are maintained under a titied time window framework in order to answer time sensitive queries.They incrementally maintain tilted_time windows for each pattern at multiple time granularities.Moreover, inspired by the fact that the FP-tree provides an effective data structurefor frequent pattern mining, they develop Fp-stream, an effective FP-tree-based modelfor mining frequent patterns from data streams. Yun Chi, Haixum wang…et al propose a moment algorithm and in-memory data structure ,The closed enumeration tree, to monitor dynamically selected small set of item sets that enable usto answer the queries“ what are the current closed frequent item sets?”Syed Khiruzzaman Tanbeer, Chowdary Farhan Ahmed proposed CP-Tree the one pass algorithm to mine frequent item sets in large databases.

Motivation for proposed system

Apriori based algorithms take more time to mine frequent patterns because they need lot of temporary candidate item set generations. More over FP-tree is a two pass and static algorithm which required pre knowledge about the stream which may not be possible in data streams. So our proposed model Dynamic DP-tree offers an algorithm which creates and modifies the tree dynamically. Experimental results show that this will give better performance than the previous ones.

Overview of Proposed Work

After data has arrived, it will divide the data stream into partitions. First take the first partition and construct the DP-Tree(Dynamic Tree) which is a Prefix-Tree data structure. DP-Tree is efficient due to its support count descending order. DP–Tree further maintains the DFS order of the nodes. And as well as it maintains the link of same nodes. Because of these new features, DP-Tree becomes an efficient prefix tree for FP-growth algorithm to prune frequent patterns. More-over each such DP-Tree will be generated for each window of the data stream. Once a DP-Tree is generated and pruned it is stored in efficient disk based data structure called “Disk based time location table “ for future references. This mechanism of storing the DP-Tree in disk will be useful when we want to retrieve the previous window for frequent pattern mining, instead of storing data as set of transactions we are storing a DP-Tree as it is in disk memory so that whenever one want to generate frequent patterns of previous windows they can just retrieve the DP-Tree from disk and apply the DP-growth algorithm to it.

Tree Restructuring

One of the two performance factor is the tree restructuring mechanism. Existing path adjusting method (PAM) , sorts the nodes of a pre-fix tree by using bubble sort method. We propose a new tree restructuring technique called branch sorting method (BSM) that unlike PAM, restructures by sorting unsorted paths in the tree one by one in frequency descending order of I-List. The time complexity to PAM is largely depends on distance among the items that has to be swap because it follows bubble sort algorithm, the time complexity is in O(n2). On the other hand BSM uses merge sort algorithm with the time complexity of O(n log 2 n); so BSM is efficient one

Item Location Table and Disk Time Location Table

This paper after constructing a tree smoothly handles that tree for pruning as well as preserving it in disk. For smooth pruning it maintains the Item Location table wich is a Hash table and Link list combined data structure. In Hash table it contains the Item id and their support count and link to the chain of nodes where that item located in DP-Tree. This data structure can be used to efficiently handling the pruning phase. More over if we use top down FP-Growth algorithm

CONCLUSION

This a paper we propose a efficient algorithm DP-Tree based on Tilted time window model. This algorithm consists of two phases insertion and restructuring and also storing the trees smoothly in disk using Disk Time Location Table.