14-02-2011, 10:01 AM
presented by:
Chedy Ra¨ıssi,Pascal Poncelet,Maguelonne Teisseire
ABSTRACT
Many recent real-world applications, such as network traf-_c monitoring, intrusion detection systems, sensor networkdata analysis, click stream mining and dynamic tracing of_nancial transactions, call for studying a new kind of data.Called stream data, this model is, in fact, a continuous, potentiallyin_nite ow of information as opposed to _nite,statically stored data sets extensively studied by researchersof the data mining community. An important applicationis to mine data streams for interesting patterns or anomaliesas they happen. For data stream applications, the volumeof data is usually too huge to be stored on permanentdevices, main memory or to be scanned thoroughly morethan once. We thus need to introduce approximations whenexecuting queries and performing mining tasks over rapiddata streams. In this paper we propose a new approach,called Speed (Sequential Patterns E_cient Extraction inData streams), to identify frequent maximal sequential patternsin a data stream. To the best of our knowledge this isthe _rst approach de_ned for mining sequential patterns instreaming data. The main originality of our mining methodis that we use a novel data structure to maintain frequentsequential patterns coupled with a fast pruning strategy. Atany time, users can issue requests for frequent maximal sequencesover an arbitrary time interval. Furthermore, ourapproach produces an approximate support answer with anassurance that it will not bypass a user-de_ned frequencyerror threshold. Finally the proposed method is analyzedby a series of experiments on di_erent datasets
INTRODUCTION
Recently, the data mining community has focused on a newchallenging model where data arrives sequentially in theform of continuous rapid streams. It is often referred to asdata streams or streaming data. Since data streams are continuous,high-speed and unbounded, it is impossible to mineassociation rules by using algorithms that require multiplescans. As a consequence new approaches were proposed tomaintain itemsets [7, 4, 3, 6, 13]. Nevertheless, according tothe de_nition of itemsets, they consider that there is no limitationon items order. In this paper we consider that itemsare really ordered into the streams, therefore we are interestedin mining sequences rather than itemsets. We proposea new approach, called Speed (Sequential Patterns E_cientExtraction in Data streams), to mine sequential patterns ina data stream. The main originality of our approach is thatwe use a novel data structure to incrementally maintain frequentsequential patterns (with the help of tilted-time windows)coupled with a fast pruning strategy. At any time,users can issue requests for frequent sequences over an arbitrarytime interval. Furthermore, our approach produces anapproximate support answer with an assurance that it willnot bypass a user-de_ned frequency thresholds.The remainder of the paper is organized as follows. Section2 goes deeper into presenting the problem statement. InSection 3 we propose a brief overview of related work.TheSpeed approach is presented in Section 4. Section 5 reportsthe result of our experiments. In Section 6, we conclude thepaper.
download full report
http://hal.archives-ouvertes.fr/docs/00/...-SPEED.pdf