23-08-2012, 01:22 PM
Towards the Integration of Multiagent Applications and Data Mining
1Towards the Integration.pdf (Size: 398.1 KB / Downloads: 42)
Abstract
This chapter has the objective to present research on combining two originally
separated areas, agents including distributed multiagent systems and data mining,
which are increasingly interrelated. Recent research has present that such interaction
features are bilateral and complementary, since new approaches and techniques
are developed to benefit from the synergetic enhancement of intelligence
and infrastructure for information processing and systems. This chapter draws attention
to illustrate agent-mining interaction with two different domain multiagent
applications: BioAgents at the bioinformatics area and MADIK at the computer
forensics area. The presented case studies are driving forces towards the integration
of the agent-mining challenging area. As ongoing research works we discuss the
prospects of both agent-mining projects.
Overview of Agents/Multiagent Systems and Data Mining
Integration
In the past decade, agents/multiagent systems (MAS) and datamining (DM)/knowledge
discovery have emerged as two increasingly interrelated research areas, opening
space to the agent-mining interaction and integration (AMII) research field. This
new field has driven efforts from both sides to find benefits and complementarity
to both communities. The AMMI field has prove to be so promising in recent years
that there is a special interest group, entitled the Agents and Data Mining Interaction
and Integration (AMII-SIG) , which aims to foster a forum for boosting the research
and development on AMII studies [4].
RelatedWork
Since, the research topics in the area of AMII are quite diverse and ubiquitous, there
are some works that might be related to all streams of the emerging area, which
were presented at [27]: [15, 57, 19]. Giving attention to integration issues of MAS
and DM we can cite [49, 43]. In [49] we find a model integration proposal MultiAgent
System for Distributed Data Mining - SMAMDD, where agents perform mining
tasks locally and merge their results into a consistent global model. In order to
achieve that, agents cooperate by exchanging messages, aiming to improve the process
of knowledge discover generating accurate results. A Web document database
integration technique is presented in [43], where mining agents were defined to information
extraction from HTML files.
Considering the MAS application domains, we now present related work to
Bioinformatics. The work of Santos and Bazzan [21, 26], deals with knowledge
discovery and data mining. In this work, authors propose a MAS to the Bioinformatics
area, where agents are responsible for applying different machine learning
algorithms and using subsets of the data to be mined, and are able to cooperate to
discover knowledge from these subsets. Authors propose a case study to use cooperative
negotiation to construct an integrated domain model from several sources.
At the bioinformatics scenario the application of the approach was related to automated
annotation of proteins’ keywords. In this work, agents do not use any domain
dependent information, as they just encapsulate data and machine learning algorithms
used to induce models to predict the annotation, using data from biological
databases. In a previous work [10, 2] wee can find efforts to generate a system for
automated annotation proteins related to the Mycoplasmataceae family data. A similar
works on automatic annotation uses symbolic machine learning techniques with
a ISA [36].
BioAgents
Enormous volume of deoxyribonucleic acid (DNA) sequences of organisms are continuously
being discovered by genome sequencing projects around the world. The
task of identifying biological function prediction for the DNA sequences is a key
activity in genome projects. This task is done in the annotation phase, which is divided
into automatic and manual. The automatic annotation has the objective of
finding, for each DNA sequence identified in the project, similar sequences among
millions, stored in public databases, e.g. GenBank [13], by using approximated pattern
matching algorithms (BLAST [5] and FASTA [50]). The manual annotation is
done by the biologists, that use the results produced by the automatic annotation,
and their knowledge and experience, to decide the function prediction to each DNA
sequence. In this way, the biologists guarantee accuracy and correctness to each
sequence function prediction.
MADIK
Computer Forensics consists of examination and analysis of computational systems,
which demands a lot of resources due to the large amount of data involved. Thus, the
success of computer forensics examinations depend on the ability to examine large
amounts of digital forensic data, in search of important evidences. Forensic examination
consists of several steps to preserve, collect and analyze evidences found
in digital storage media, so they can be presented and used as evidence of unlawful
actions. In this scenario, either distributed agent/multiagent architectures and
DM techniques can be of great help. At real computer forensics cases, experts can’t
define at first what evidence is more relevant to the incident or crime under investigation.
Thus, a pre-analysis of the suspect machines would limit the number of
evidences collected for examination, reducing the time of investigation and analysis
by the forensic experts. But the lack of intelligent and flexible tools to help forensic
experts with the pre-analysis phase, and with a concrete cross-analysis of large
number of potential correlated evidences, is a reality.
Evaluation and FutureWork
There are many different ways to tackle the emergent AMII topic. Related to bioinformatics
domain area, the MAS architecture defined to BioAgents, described in
Section 2.3, is totally adequate to the integration of DM techniques. The existence
of the physical layer, which is composed by many public biological databases, e.g.
nr, COG, GO KOG and Swiss-Prot, can be automatically mined with many different
algorithms defined into sets of ISA in a distributed and parallel way. Although automated
annotation has not been the focus of BioAgents project so far, certainly this
process can increase its performance through DM techniques integration. For this
focus, a very good computational infra-structure is necessary to deal with real-world
biological data scale volume, even counting on the cooperative skills of the agent
society.