23-05-2012, 12:22 PM
Data mining: a database perspective
Data mining a database perspective..pdf (Size: 172.74 KB / Downloads: 112)
Abstract
Data mining on large databases has been a major concern in research com-
munity, due to the diculty of analyzing huge volumes of data using only
traditional OLAP tools. This sort of process implies a lot of computa-
tional power, memory and disk I/O, which can only be provided by parallel
computers. We present a discussion of how database technology can be
integrated to data mining techniques. Finally, we also point out several ad-
vantages of addressing data consuming activities through a tight integration
of a parallel database server and data mining techniques.
1 Introduction
Data mining techniques have increasingly been studied7;9;21, espe-
cially in their application in real-world databases. One typical prob-
lem is that databases tend to be very large, and these techniques
often repeatedly scan the entire set. Sampling has been used for a
long time, but subtle dierences among sets of objects become less
evident.
This work provides an overview of some important data mining
techniques and their applicability on large databases. We also spot
several advantages of using a database management system (DBMS)
to manage and process information instead of conventional
at les.
This approach has been a major concern of several researches, be-
cause it represents a very natural solution since DBMSs have been
successfully used in business management and currently may store
valuable hidden knowledge.
Data Mining Techniques
Data mining is a step in knowledge discovery in databases (KDD)
that searches for a series of hidden patterns in data, often involving
a repeated iterative application of particular data mining methods.
The goal of the whole KDD process is to make patterns understand-
able to humans in order to facilitate a better interpretation of the
underlying data11.
We present four classes of data mining techniques typically used
in a variety of well-known applications and researches currently cited
in the database mining community. They certainly do not represent
all mining methods, but are a considerable portion of them when a
large amount of data is considered.
Data Mining and DBMSs
Database technology has been successfully used in traditional busi-
ness data processing. Companies have been gathering a large amount
of data, using a DBMS system to manage it. Therefore, it is desirable
that we have an easy and painless use of database technology within
other areas, such as data mining.
DBMS technology oers many features that make it valuable
when implementing data mining applications. For example, it is pos-
sible to work with data sets that are considerably larger than main
memory, since the database itself is responsible for handling informa-
tion, paging and swapping when necessary. Besides,
Conclusions
Data mining and its application on large databases have been ex-
tensively studied due to the increasing diculty of analyzing large
volumes of data using only OLAP tools. This diculty pointed out
the need of an automated process to discover interesting and hidden
patterns in real-world data sets. The ability to handle large amounts
of information has been a major concern in many recent data mining
applications. Parallel processing comes to play an important role in
this context, once only parallel machines can provide sucient com-
putational power, memory and disk I/O.