Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

CHALLENGES, ISSUES AND APPLICATIONS IN SPATIO
TEMPORAL DATA MINING

[attachment=70741]

ABSTRACT

Spatiotemporal data usually contain the states of an object, an event or a position in space over a period of time. Vast amount of spatiotemporal data can be found in several application fields such as traffic management, environment monitoring, and weather forecast. These datasets might be collected at different locations at various points of time in different formats. It poses many challenges in representing, processing, analysis and mining of such datasets due to complex structure of spatiotemporal objects and the relationships among them in both spatial and temporal dimensions. In this paper, the issues and challenges related to spatiotemporal data representation, analysis, mining and visualization of knowledge are presented. Various kinds of data mining tasks such as association rules, classification clustering for discovering knowledge from spatiotemporal datasets are examined and reviewed. System functional requirements for such kind of knowledge discovery and database structure are discussed. Finally applications of spatiotemporal data mining are presented.

KEYWORDS

Spatiotemporal data mining, spatiotemporal data mining issues, spatiotemporal data mining tasks, spatiotemporal data mining applications

1. INTRODUCTION

A spatiotemporal object can be defined as an object that has at least one spatial and one temporal property. The spatial properties are location and geometry of the object. The temporal property is timestamp or time interval for which the object is valid. The spatiotemporal object usually contains spatial, temporal and thematic or non-spatial attributes. Examples of such objects are moving car, forest fire, and earth quake. Spatiotemporal data sets essentially capture changing values of spatial and thematic attributes over a period of time. An event in a spatiotemporal dataset describes a spatial and temporal phenomenon that may happens at a certain time t and location x. Examples of event types are earth quake, hurricanes, road traffic jam and road accidents. In real world many of these events interact with each other and exhibit spatial and temporal patterns which may help to understand the physical phenomenon behind them. Therefore, it is very important to identify efficiently the spatial and temporal features of these events and their relationships from large spatiotemporal datasets of a given application domain.

The significance of spatiotemporal data analysis and mining is growing with the increasing availability and awareness of huge amount of geographic and spatiotemporal datasets in many important application domains like

• Meteorology: all kinds of weather data, moving storms, tornados, developments of high pressure areas, movement of precipitation areas, changes in freezing level, droughts.
• Biology: animal movements, mating behavior, species relocation and extinction.

• Crop sciences: harvesting, soil quality changes, land usage management, seasonal grasshopper infestation.

• Forestry: forest growth, forest fires, hydrology patterns, canopy development, planning tree cutting, planning tree planting.

• Medicine: patients’ cancer developments, supervising developments in embryology.
• Geophysics: earthquake histories, volcanic activities and prediction.

• Ecology: causal relationships in environmental changes, tracking down pollution incidents.

• Transportation: traffic monitoring, control, tracking vehicle movement, traffic planning, vehicle navigation, fuel efficient routes.

In addition to these individual areas, combinations of phenomena are also of interest. For example what changes in forests can be linked to which kind of animal behavior, which weather developments are responsible for grasshopper infestation. Moreover, some combinations pose particular planning challenges. For example, extreme weather events require rerouting of cars, planes, and ships.

Modeling and representation of spatiotemporal phenomena is complex due to two reasons. First reason is continuous and discrete changes of spatial and non spatial properties of the spatiotemporal objects. The second one is the influence of collocated neighboring spatiotemporal objects on one another. For example spread of fire is influenced by rain and changing wind speed and direction. Understanding spatiotemporal phenomena calls for processing, analysis and mining of vast amounts of spatiotemporal data along spatial, temporal and thematic attribute dimensions at multiple levels of granularity.

Spatiotemporal analysis can be categorized as temporal data analysis, spatial data analysis, dynamic spatiotemporal data analysis and static spatiotemporal data analysis. The temporal data analysis fixes the spatial dimension and analyzes how thematic attributes data change with time. Analysis of rainfall, temperature and humidity of a given region over a period of time is an example of this kind. The spatial data analysis analyzes how thematic attributes data changing with respect to a distance from a spatial reference at a specified time. Study of change in temperature and humidity values when moving away from sea coast at a given time is an example of this type. The dynamic spatiotemporal data analysis fixes thematic attributes dimension and analyzes how spatial properties change with time. Analysis of moving car data, spread of fire are examples of this category. The static spatiotemporal data analysis fixes the temporal and thematic attribute dimensions and studies the spatial dimension. An example of this is finding locations having same rainfall at same time. Analysis of large volume of spatiotemporal data without fixing any dimension is very difficult and complex. However the data mining can be used to uncover unknown patterns and trends within the data.

Spatiotemporal data mining is an emerging research area dedicated to the development and application of novel computational techniques for the analysis of large spatiotemporal databases.

It encompasses techniques for discovering useful spatial and temporal relationships or patterns that are not explicitly stored in spatiotemporal datasets. Usually these techniques have to deal with complex objects with spatial, temporal and other attributes. Both spatial and temporal dimensions add substantial complexity to the data mining process. Classical data mining techniques often perform poorly when applied to spatiotemporal data sets for many reasons. First, spatial data is embedded in a continuous space, whereas classical datasets are in discrete notions like transactions [1]. Second, a common assumption about independence of data

samples in classical statistical analysis is generally false because spatial data tends to be highly auto-correlated. Others include categorization of spatiotemporal patterns, interest measures to quantify them and design of computationally efficient and scalable algorithms to mine their instances.

This paper is organized as follows. Section 2 describes issues and challenges in general for spatiotemporal data mining. Different kinds of spatiotemporal data mining tasks and issues related to those tasks are discussed in section 3. Approach for modeling spatiotemporal data mining application and example applications are presented in section 4. Section 5 concludes the paper.

2. ISSUES AND CHALLENGES

General issues and challenges in representation, processing, analysis and mining of spatiotemporal data are described below.

1. Design and development of robust spatiotemporal representation and data structures is the fundamental issue for spatiotemporal data handling, analysis and mining.

2. The unique characteristics of spatiotemporal datasets are that they carry distance and topological information which require geometric and temporal computation.

3. Spatial and temporal relationships like distance, topology, direction, before and after are information bearing. They need to be considered in spatiotemporal data analysis and mining.

4. Spatial and temporal relationships are implicitly defined. They are not explicitly encoded in a database. These relationships must be extracted from data. There is a trade-off between preprocessing them before the actual mining process starts and computing them on-the fly as and when they are actually needed.

5. Scale effect in space and time is a challenging issue in spatiotemporal data analysis and mining. Scale in terms of spatial resolution or temporal granularity can have a direct impact on the kind and strength of spatiotemporal relationships [2] that can be discovered in datasets.

6. The unique characteristic of spatiotemporal datasets requires significant modification of data mining techniques so that they can exploit the rich spatial and temporal relationships and patterns embedded in the datasets.

7. The attributes of neighboring patterns may have significant influence on a pattern and should be considered. For example, spatiotemporal event like hurricane will have influence on traffic jam pattern.

8. Many rules of qualitative reasoning (ex: transitive property) on spatial and temporal data provide a valuable source of domain independent knowledge that should be taken into account when generating patterns. How to express rules and how to integrate them with spatiotemporal reasoning mechanism is an issue.

9. Visualization of spatiotemporal patterns and phenomena, scalability of data mining methods, data structures to represent and efficiently index spatiotemporal datasets are also challenging issues.

10. Development of efficient techniques for visualization of spatiotemporal knowledge and interaction facilities for gaining an insight of underlying phenomena represented by the knowledge is another challenge. This requires the results of spatiotemporal data mining are to be embedded within a process that interprets the results for further properly structured investigation into reasons behind the results.

11. Development of effective visual interfaces for viewing and manipulating the geometrical and temporal attributes of spatiotemporal data is another challenge.

3. SPATIOTEMPORAL DATA MINING TASKS

Regular structures in space and time, in particular, repeating structures, are often called patterns. Patterns that describe changes in space and time are referred to as spatiotemporal patterns. Spatiotemporal data mining tasks are aimed at discovering various kinds of potentially useful and unknown patterns and trends from spatiotemporal databases. These patterns and trends can be used for understanding spatiotemporal phenomena and decision making or preprocessing step for further analysis and mining. Depending on kind of knowledge to be mined, various spatiotemporal data mining tasks are described in this section.

3.1 Multidimensional analysis of spatiotemporal data

The multidimensional approach for data analysis is based on the concept of facts analyzed with respect to various dimensions. Spatiotemporal data carries multi-dimensional information such as time, location, geometry and non-spatial attributes of spatiotemporal objects. Multidimensional spatiotemporal data model integrates spatial and temporal structures to model the existence of spatial objects over time. It also supports multiple concept hierarchies for the dimensions like time, location and other attributes. This facilitates spatiotemporal data aggregation on the dimensions and dimension hierarchies which results into cuboids of spatiotemporal data cube. This data cube can be used by spatiotemporal on-lone analytical processing tools [3] to perform static and dynamic spatiotemporal data analysis as well as temporal and spatial data analysis. Multidimensional model of spatiotemporal data enables to discover evolution rules which describe the manner in which spatial entities change over time. The issue here is development of new methods and techniques for high-dimensional fast analysis and aggregation of spatiotemporal data [4,5,6].

3.2 Spatiotemporal Characterization

Characterization of spatiotemporal data is performed by applying attribute oriented induction based generalization technique. Generalization is performed on spatial, non-spatial and/or temporal attributes. The attribute oriented induction does the aggregation either by attribute removal or attribute generalization. The attribute generalization involves use of concept hierarchies defined on the attribute dimension for data aggregation. Based on the order in which the generalization of attributes is done, there are different types of generalization. Spatial data dominant generalization [7] fixes the temporal dimension and does generalization of spatial attributes first and then proceeds to generalize non-spatial attributes next. Non-spatial data dominant generalization [7] fixes the temporal dimension and performs generalization on non-spatial attributes first, then generalizes spatial attributes next. Similarly spatial dimension can be fixed for characterization of non-spatial attribute data of a particular location over temporal dimension or non-spatial dimension can be fixed to characterize spatial attributes over temporal dimension. Characterization of spatiotemporal data needs the incorporation of statistical techniques used in application domains for computation and presentation. For example, characterization of climatic conditions of a given geographic region over a period of time has to consider correlations, seasonal effects and extreme values over a period of time [8].

3.3 Spatiotemporal Topological Relationship discovery

The topological relationships between two spatial objects at an instance of time can be any one among disjoints, meets, overlaps, contains, covers, intersects and equals. This relationship may change over time. Discovering the time-varying topology among objects involves processing the evolution of the spatial objects and computing topological relationship among them at different points of time [2]. The topological relationships among spatial objects can be represented using a graph in which nodes represent spatial objects and the edges represent topological relationship between the nodes. So discovery of time-varying topology results in producing a series of such graphs representing the topological relationships among the spatial

objects for different time intervals. Experimental program to detect spatiotemporal topological relationships between boundary lines of land parcel is developed in [9].

3.4 Mining Spatiotemporal Topological Relationship Patterns

The topological relationship between two spatial objects may change if geometry or location of any one of the spatial objects changes. The geometry and location changes of spatial objects with time are generally captured and stored in spatiotemporal databases. The changing topological relationship among spatial objects with time is represented using spatiotemporal topological relationship pattern [10]. For example, the topological relationship change between two spatial objects O1 and O2 from time t1 to t4 is shown in Fig., 1. The topological relationship pattern for this example can be represented as D-O-C-T where D, O, C, T corresponds to disjoints, overlaps, contains and touches respectively. Support for such patterns can be computed so that it can be used in decision making. If these patterns appear more than specified number of times, then they are called periodic patterns.

3.5 Spatiotemporal Neighborhood

Every spatiotemporal object associated with some position(x, y) in space and a valid timestamp (ts). Two spatiotemporal objects o1, o2 are spatial neighbors if the spatial distance between them is less than specified neighborhood threshold value. The spatial distance between o1, o2 can be computed as SQRT ((o1.x - o2.x) 2 + (o1.y - o2.y) 2). Similarly o1 and o2 are temporal neighbors if temporal distance between them is less than specified time window. The temporal distance can be computed as modulus of (o1.ts – o2.ts). The o1 and o2 are spatiotemporal neighbors if they are both spatial neighbors and temporal neighbors. The purpose of the spatiotemporal neighborhoods is to provide regions in data where knowledge discovery tasks such as clustering and outlier detection can be focused. Methods to generate spatial neighborhoods and to discrtize temporal intervals are developed in[11] and tested on real life datasets related to sea surface temperature. To capture the concept of “nearby”, a neighborhood set N is defined as a set of objects such that every pair of objects in the set are spatiotemporal neighbors. Neighborhood set computation can be used as a preprocessing step to clustering, outlier detection, and collocation pattern discovery and also in online analytical processing. An algorithm for generation of spatiotemporal dynamic neighborhood is proposed and evaluated in [12] for discovering tele-connected flow anomalies.

3.6 Spatiotemporal Association Rules

Spatiotemporal Association rules (STARs) can be categorized into three types. 1. Spatiotemporal association rules involving moving or migrating objects from one region to another region. 2. Spatiotemporal association rules involving topological relationships. 3. Spatiotemporal association rules which are having thematic information of spatial objects.

3.6.1.1 STARs involving moving objects

This category of STARs [13] describes how objects move between regions over time. A STAR that represents spatial objects satisfying conditions q and migrated from one region, say ri, to another region, say rj, during time period [t1, t2] can be specified as

(ri,t1,q) => (rj,t2) [s%, c%] where s is support of the rule and c is the confidence of the rule.

The support is the number of spatial objects that migrated from region ri to the region rj in time period [t1,t2] divided by the total number of distinct objects that satisfy q during time period [t1,t2] and are contained in any of the region in antecedent or consequent of the rule . The confidence is defined as the ratio of the number of objects migrated to the total number of objects in region ri at time t1. For example, let R1 and R2 be two spatial regions. R1 contains spatial objects a,b,c,d,e,f and R2 contains spatial objects g,h,i at time t1. Due to migration of the objects, R1 contains the objects a,c,d and R2 contains b,e,f,g,h,i at time t2. The following association rule with support and confidence can be specified.

(R1,t1,q) => (R2,t2) [33%, 5%] where s[(R1,t1,q) => (R2,t2)] = 3/9=0.33 and

c[(R1,t1,q) => (R2,t2)] = 3/6=0.5

Based on the analysis of migration of objects among regions over time, spatiotemporal regions can be characterized as stationary regions, high traffic regions. The later can further be characterized as sources, sinks and thoroughfares. A region r is a stationary region over time interval TI, if the ratio of number of objects remain in r and total number of objects in r during TI is more than user specified minimum support ( saymin_sup). A region r is a source if the ratio of number of objects left r to the total number of objects in r during TI is more than min_sup. A region r is a sink if the ratio of number of objects entered r to the total number of objects in r during TI is more than min_sup. If a region is both sink and source, then it is identified as thoroughfare.

3.6.1.2 STARs involving topological relationships

These rules involve spatial topology predicates like S_overlaps, S_Intersects and Temporal predicates like T_covers, T_Overlaps or spatiotemporal topological predicates such as ST_Disjoints, ST_Touchs, ST_Overlaps. Mining this kind of rules need preprocessing of spatiotemporal data to find topological relationships and organizing those results to apply association rule mining technique or modifying the technique to generate association rules from raw spatiotemporal data. For example, ST_Overlaps (LandParcel1, Flood, Duration1) and T_Covers(Season1,Duration1) => Yield ( LandParcel1, Low, Season1 ) is an association rule of this type.

3.6.1.3 STARs involving Thematic attributes

Association rules involving spatial, temporal features and thematic attributes or non-spatial attributes fall in this category. Some kind of preprocessing may be required while generating this type of rules. For example rain(Ri ,t1) and neighborhood(Rj , Ri) => flood(Rj,t2)[s%,c%] needs neighborhood computation.

3.7 Spatiotemporal data classification

The spatiotemporal data classification is a supervised learning technique. It is a two step process, model building and model Usage. The model building stage takes classified spatiotemporal dataset as input and construct the model using classification techniques like Decision Trees, Neural Networks, Genetic algorithms or Rough sets[14]. Then the model is tested for its accuracy using spatiotemporal test dataset. The model if it is acceptable will be used to classify the new spatiotemporal objects whose class label is unknown. The techniques used for non-spatial data classification need modification to accept spatial objects and their

changes with time. For example, input layer of a neural network based classifier takes attribute values of an object as one record to compute weights of connections and error in back propagation learning technique. But in case of spatiotemporal data, the attribute values of the spatial objects including its location, shape at different timestamps are to be considered as one record for the input layer. Grouping of regions into known categories based on known climate conditions [8] using Bayes’ theorem is an example for spatiotemporal classification.

3.8 Trend Prediction or Detection

Trend prediction is an important task in spatiotemporal data mining. The prediction of events occurring at particular geographic locations is very important in several application domains. Examples of problems which require location prediction include crime analysis, cellular networking, and natural disasters such fires, floods, droughts, diseases, and earthquakes. The location and/or geometry of a moving spatial object are dynamic attributes which are function of time and other non-spatial attributes whose values change continuously. For example, location and geometry of moving cyclone depends on time, wind speed, direction and pressure. Input output pairs denoted by (xi,yi) are approximated by a function of the form y = f(x) and used for prediction. For example, it is possible to predict the spread of a disease to different regions based on the geographic locations, highway networks, temperature, wind velocity, time and many other factors using regression and other predictive modeling methods [4]. Spatial Autoregressive Model (SAR) for linear regression and neural network based approach or support vector machines for nonlinear regression are used in prediction of climate conditions [8]. Bayesian statistical approach is used in trend prediction of total mercury in Lake Erie [15].

3.9 Spatiotemporal data clustering

Clustering is one of the major data mining methods for knowledge discovery in large databases. It is the process of grouping large data sets according to their similarity. Spatiotemporal clustering algorithms [16,17] have to consider the spatial and temporal neighbors of objects while extracting the clusters. Spatiotemporal clustering has many variants as described below due to varying requirements of different applications.

1. Clustering of regions or locations based on non-spatial attribute values of spatiotemporal objects over a period of time in a given geographic area. If this is applied to traffic management in a city, the resulting spatiotemporal clusters shows regions of more traffic at different points of time in a day.

2. Clustering of spatiotemporal objects which are moving through the regions over a period of time. If this is applied to moving objects like animals, the resulting clusters shows herd evolvement and behavior of animals [18]. If it is applied to user history [19], then the representatives like centroids or medoids of resulting spatiotemporal data clusters give mobility user profile [19,20].

3. Discovering moving clusters [21,22] from spatiotemporal data where the cluster identity remains same but the objects in the cluster may not be same. If this is applied to moving vehicles, the resulting clusters model the behavior of traffic movement in a given region over a period of time.

4. Trajectory clustering [23] is the process of grouping of similar trajectories during a specific time period. One approach for trajectory clustering is partition-and-group framework [24] in which each trajectory is partitioned into a set of line segments and then similar line segments are grouped together to form a cluster. The issues in trajectory clustering are (i) identifying similarity function (ii) how clustering is to be performed. Trajectory clustering can be used in air space monitoring and traffic planning and control applications [25].

5. Shape clustering [26] technique groups the data points based on spatial density. For example, the data points that are packed within a predefined distance can be classified as one group, while the data points that are sparse outside of the neighborhood

distances can be clustered as another group. Then a shape based tracking algorithm [26] can be used to track and monitor those clusters in a sequence of images. An example of this type is monitoring of ocean objects [26].

3.10 Spatiotemporal outlier analysis

Outlier analysis discloses strange objects which appear to be inconsistent with the other objects in the dataset. The outlier objects deviate too much from other observations. The spatiotemporal outlier can be defined as a spatial referenced object whose non-spatial attribute values are significantly different from those of other objects in its spatial and temporal neighborhoods [27,28,29]. For example, a fast moving vehicle over taking many other vehicles over long period of time may not fit into any moving cluster and it can be detected as an outlier. A three step approach proposed in [27] for discovering spatiotemporal outliers is different from general approaches such as distribution based, depth based and distance based described in [28] for outlier detection. An algorithm is proposed in [29] for discovering spatiotemporal outliers and causal relationships between them. An algorithm for spatiotemporal outlier detection is proposed in [30] and used for detecting outlier sequences in precipitation data. A roughest approach is described in [31] for spatiotemporal outlier detection.

3.11 Spatiotemporal Collocation pattern or episode discovery

A spatiotemporal collocation pattern represents two or more object types whose instances are often located in spatial and temporal proximity. A collocation episode is a sequence of spatiotemporal collocation patterns with some common object types across consecutive time slots. Spatiotemporal collocation discovery uncover the existence of two or more types of spatial features that frequently locate together. For example, sets of different types of objects that change directions, speed, and geographic locations in a similar way and move close to each other for some period. An instance of this example is patterns of movement of rabbits and foxes tend to be collocated. Discovering spatiotemporal collocation episodes catch the inter-movement regularities among different types of objects [32]. For example, if a puma is moving near a deer, then a vulture is also going to move close to the same deer with high probability. In a collocation episode, there is a particular object ( e.g., deer ) called centre feature which participates in a sequence of collocations (e.g., deer-puma, deer-vulture ). A two phase mining methodology is proposed in [32] to discover frequent collocation episodes. An algorithm is proposed in [33] to discover zonal co-location patterns for dynamic parameter. Another algorithm is discussed in [34] to discover mixed-drove spatiotemporal co-occurrence patterns.

3.12 Discovering Movement Patterns

Movement patterns specify any recognizable spatial and temporal regularity or any interesting relationship in movement dataset. These patterns are classified [35] as generic patterns and behavioral patterns. Detection and description of movement patterns from spatiotemporal data are essential for better understanding of the behavior of moving objects. A sequence of time stamped point locations describing the path of a moving object is called its trajectory. Given a set of trajectories, the grouping dynamics of the moving entities described by their trajectories can be discovered. Interesting grouping dynamics are flock, leadership pattern, convergence or meeting place, periodic pattern and frequent location. The flock pattern describes group of entities moving close to each other for an extended period of time. “close to each other“ means inside a circle of some specified radius r. A set of entities can have many flock patterns and even one single entity can be involved in several flock patterns. The leadership pattern is similar to the flock pattern, except that one of the entities was already moving in the specified trajectory for some time before the flock pattern occurs. Convergence or meeting place refers to the specified number of moving objects converge to the same location for specified time steps. “same location” is formalized as a circle of specified radius. A periodic pattern describes

behavior of an entity that shows the same spatiotemporal pattern with some periodicity. A frequent location refers to a frequently visited location which is a region where a single entity spends a lot of time. Approximation algorithms are described to compute flock, leadership and convergence in

mkaasees