01-01-2013, 10:35 AM
Data Grid Mining of Mobile User Behaviors in Web Environments
1Data Grid Mining.pdf (Size: 182.55 KB / Downloads: 43)
Abstract
Mobile E-Commerce provides location-based services to mobile users in web
environments. One of the best ways to personalize mobile services is based on location. In
this paper, we propose a new algorithm called the Distributed Pattern Miner (DPM), for
mining location-aware service request patterns from distributed databases on a Data Grid.
The location and service request patterns represent frequently requested services and the
corresponding location of mobile users in mobile web environments. These patterns are
used to predict the next location of mobile users and the service requests in the future. The
grid provides an effective computational and communicational support for distributed data
mining applications. We built a data grid system on a cluster of workstations using an open
source Globus Toolkit (GT). We have compared the performance of the existing
conventional distributed mining algorithm with the DPM in a grid environment, in terms of
computation time. The experimental result shows that the DPM gives a better performance
than other sequential and conventional distributed algorithms.
Introduction
The rapid development of wireless and World Wide Web (WWW) technologies enables mobile users
to request various kinds of services via mobile devices, while moving across different locations. A
mobile user may submit a service request, e.g. “find the nearby medical shops” at location A. Then he
can move to another location B, where he can submit a service request, “find the restaurants nearby”.
The mobile user can submit various kinds of service requests in different locations. In a mobile web
environment, the mobile users' service requests and location logs are accumulated as a large data set in
a distributed database [1], [2]. These databases need to be analyzed to generate location aware service
patterns using data mining techniques. In this regard, predicting the behavior of mobile users in terms
of location and service request must be required for efficient service provision, which is the main
objective of this paper.
Related Works
An efficient algorithm for mining an association rule is the Apriori, first proposed by Agrawal et al [6],
[7]. A number of further studies were done based on the Apriori algorithm. The paper [6] deals with an
algorithm for generating association rules on a large database. Some studies were made on data mining
techniques to predict the location of mobile users in a web environment [8]. In [9], a data mining
algorithm is executed on a centralized server to find the mobile user’s location aware service request
patterns. If the source log file is very large, a centralized algorithm will lead to problems in data
communication, process overhead and security. So, mining data sets in distributed sites is an efficient
method. Some studies have been done on distributed data mining for association rule mining [10]. The
existing conventional distributed data mining system does not have the capability to achieve a better
performance.
A few research works currently exist on distributed mining in data grids [11], [12], [13], [14].
The paper [11] deals with the execution of the association rule mining algorithm to generate frequent
item sets from the database distributed on a data grid. It consists of a local mining phase and a global
mining phase. This algorithm is scalable as its communication and synchronization overhead is
relatively small. In [12], the service oriented architecture of distributed data mining based on the Open
Grid Service Architecture (OGSA) has been proposed. This approach is an attempt to develop the
infrastructure to integrate distributed resources. Distributed data mining is widely used in the
knowledge discovery process to analyze large sized databases maintained over geographically
distributed sites. The paper [13] focused on the design and implementation of an infrastructure for
geographically distributed high-performance applications called the knowledge grid.
Distributed Data Mining on the Data Grid System
In this paper, we have developed a Data Grid system with distributed data bases, using an open source
Globus Toolkit 4 [21], and implemented a new distributed two dimensional associational rule mining
algorithm named as the Distributed Pattern Miner. The mining algorithm is developed to generate a
mobile user’s location-aware service request patterns from the database distributed in a data grid. Fig.1
shows the architecture of the data mining system using data grid services.
Distributed Pattern Miner Algorithm
We examine the distributed mining of location aware service request patterns of mobile users from the
distributed database on a data grid system. The DPM is applied on moving logs and service request
logs of mobile users stored in geographically distributed data bases to generate location aware service
request association rules, which provide various location based services for mobile users. The data
source nodes are connected together to form a data grid system. The distributed data miner algorithm is
executed on each data source node on a data grid to find the local frequent location and service request
patterns, and the local results are integrated to generate the global frequent patterns.
Next Location and Service Request Prediction
This is the last step of our algorithm. In this step, the next location of the mobile user and service
request in a web environment is predicted. Assume that a mobile user has followed some locations and
requested services pattern P = <P1, P2, P3,…,Pk-1> till now. Our system finds out the rule whose head
parts are contained in pattern P, and also the last location and service request as Pk. The output of the
matching process is a set of location aware service request association rules called matching rules. The
tail part of the matching rule represents the predictions made for the mobile user’s current location and
service request. The matching rules are stored in the descending order of confident values for
predictions. The parameter m represents the maximum number of predicted matching rules for
predicting the user’s next location and service request.
Impact of Precision
In our experiment, the parameter m represents the number of predictions made each time the mobile
user moves from one location to another. If the number of predictions m increases, the number of
correct predictions is increased and at the same time the probability of getting incorrect locations is
also higher. From Fig. 3., when m increases, the precision of the predictions made by the DPM
algorithm decreases at a slower rate. The precision is calculated as a ratio of the number of correctly
predicted locations to the total number of predictions made.