05-09-2012, 03:19 PM
Clustering Time Series Data Stream – A Literature Survey
1Clustering Time.pdf (Size: 563.12 KB / Downloads: 70)
Abstract
Mining Time Series data has a tremendous
growth of interest in today’s world. To provide an
indication various implementations are studied and
summarized to identify the different problems in existing
applications. Clustering time series is a trouble that has
applications in an extensive assortment of fields and has
recently attracted a large amount of research. Time series
data are frequently large and may contain outliers. In
addition, time series are a special type of data set where
elements have a temporal ordering. Therefore clustering of
such data stream is an important issue in the data mining
process. Numerous techniques and clustering algorithms
have been proposed earlier to assist clustering of time series
data streams. The clustering algorithms and its effectiveness
on various applications are compared to develop a new
method to solve the existing problem. This paper presents a
survey on various clustering algorithms available for time
series datasets.
INTRODUCTION
Today Time Series data management has become an
interesting research topic by the data miners. Particularly,
the clustering of time series has attracted the interest of
researchers. Data mining is usually constrained by three
limited resources. They are Time, Memory and Sample
size. Recently time and memory seem to be bottleneck for
machine learning application. Clustering is an unsupervised
learning process for grouping a dataset into subgroups. A
data stream is an ordered sequence of points x1, , , , , ,xn.
These data can be read or accessed only once or a small
number of times. A time series is a sequence of real
numbers, each number indicating a value at a time point.
Data flows continuously from a data stream at high speed,
producing more examples over time in recent real world
applications.
FUTURE WORK
Clustering time series data is a difficult task in the
applications that has wide-range assortment of fields, and
has recently attracted a large amount of research. The
proposed study provides a way to investigate the existing
algorithms and techniques for clustering of time series data
streams and helps to give directions for future
enhancement. Future research can be directed to the
following aspects:
1. Cluster time series data in high dimensional data by
increasing the speed.
2. Computation effort can be increased in high –
dimensional data using clipping technique.
3. An effective approach can be developing to predict the
future value in time series data.
4. Since, Time series data deals with raw format which is
expensive in terms of processing and storage. In the
proposed work a proposed time series data format can
be taken to solve the above problem.
CONCLUSION
In modern years, the management and processing of socalled
data streams has become a subject of dynamic
research in numerous fields of computer science such as,
e.g., distributed systems, database systems, and data
mining. Lot of research work has been carried in this field
to develop an efficient clustering algorithm for time series
data streams. Time series data are frequently large and may
contain outliers. Therefore, careful examination of the
earlier proposed algorithms is necessary. In this paper we
surveyed the current studies on time series clustering. These
studies are structured into many categories depending upon
whether they work directly with the innovative data. Most
clustering algorithms are not capable to make a distinction
between real and random patterns. In addition, this paper
discusses about possible high dimensional problems with
time series data.