Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Comparison between ANN and Decision tree in Aerology Event Prediction
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Abstract
Predictive systems use historical and other available
data to predict an event. In this paper we tries to
compare the power of Artificial Neural Network (ANN)
and Decision Tree (DT) in prediction of aerology
events with time series streams and events stream
using combination of K-means clustering algorithm
and Decision Tree C5 algorithm and ANN. We try to
find the effective parameters on events occurrences.
Firstly, we find the closest time series record for any
events; therefore, we have gathered different
parameters value when an event is occurring. Using Kmeans
we add a field to dataset which determines the
cluster of each record and after that we predict the
events using C5 algorithm and ANN. This framework
and time series model can predict future events
efficiently.
We gathered 1961 until 2005 data of aerology
organization for Tehran Mehrabad Station. This data
contains some fields such as wet bulb, relative
humidity, amount of cloud, wind speed and etc. This
dataset includes 17 types of events. Using this
framework the closest event can be predicted. The C5
method is able to predict events with 79.55% accuracy
and ANN with 72.87% accuracy. Applying K-means
clustering algorithm the prediction increase to 94.59%
for C5 and 92.66% accuracies for ANN. We use 10-
fold cross validation to evaluate our prediction rate.
This framework is the first estimation in the area of
event prediction for a huge dataset of aerology and
can be extended in many different datasets in any other
environments.
1. Introduction
Data mining extracts implicit, previously unknown,
and potentially useful information from large datasets
and has been successfully used in a wide variety of
applications and for varied purposes. Data mining can
be a powerful way for companies to gain a competitive
advantage [7].
Several classification models were built using a wide
variety of machine learning and statistical algorithms
to assess the predictability of the phenomenon. The
early experimental results (i.e., prediction accuracy
related performance measures) lead us to focus on two
types of machine learning algorithms for this
challenging classification problem, artificial neural
networks and decision trees [18].
Artificial neural networks (ANNs) are commonly
known as biologically inspired, highly sophisticated
analytical techniques, capable of capturing highly
complex non-linear functions. In this study, we use a
popular ANN architecture called Multi-Layer
Perceptron (MLP) with back-propagation (a supervised
learning algorithm) [6].
DTs are an effective method of classifying dataset
entries and can provide good decision support
capabilities. DTs have several advantages over other
data mining methods, including being humaninterpretable,
well-organized, computationally
inexpensive, and capable of dealing with noisy data.
Due to these merits, DTs are probably the most popular
mining method [20].
The most commonly used mathematical algorithm for
splitting includes Entropy based information gain (used
in ID3, C4.5, C5), Gini index (used in CART), and chisquared
test (used in CHAID). Based on the favorable
prediction results we have obtained from the
preliminary runs, in this study we chose to use C5
algorithm as our decision tree method [6].
Event log have been used on many computer systems
for recording errors occurring in hardware and
software components of the systems. These logs are
used by system administrators to monitor the health of
machine. Successful prediction of errors in a computer
system offers the promise of enabling significantly
improved system management [1, 4].
In these domains many events may occur along the
lifetime of systems. Successful applications of
induction methods hold obvious benefits and there
exist large literatures on computational methods for
regression and time series prediction [11, 12].