07-12-2012, 01:33 PM
Building Evolving Ontology Maps for Data Mining and Knowledge Discovery in Biomedical Informatics
Building Evolving Ontology.pdf (Size: 171.38 KB / Downloads: 30)
Abstract
The explosion of biomedical data and the growing number
of disparate data sources are exposing researchers to a new
challenge - how to acquire, maintain and share knowledge
from large and distributed databases in the context of rapidly
evolving research.
This paper describes research in progress on a new
methodology for leveraging the semantic content of ontologies
to improve knowledge discovery in complex and dynamic
domains. It aims to build a multi-dimensional ontology able to
share knowledge from different experiments undertaken
across aligned research communities in order to connect areas
of science seemingly unrelated to the area of immediate
interest. We analyze how ontologies and data mining may
facilitate biomedical data analysis and present our efforts to
bridge the two fields, knowledge discovery in databases, and
ontology learning for successful data mining in large
databases. In particular we present an initial biomedical
ontology case study and how we are integrating that with a
data mining environment.
INTRODUCTION
The explosion of biomedical data and the growing
number of disparate data sources are exposing researchers
to a new challenge - how to acquire, maintain and share
knowledge from large and distributed databases in the
context of rapidly evolving research. Blagoskolonny and
Perdee’s proposal presented the “Conceptual Biology”
challenge, to build a knowledge repository capable of
transforming the current data collection era into one of
hypothesis –driven, experimental research. In doing so we
must consider that in addition to research-informed
literature biomedical data is tremendously diverse and can
consist of information stored in genetic code, identified in
genomics and proteomics research by discovering
sequencing patterns, gene functions, and protein-protein
interactions, along with experimental results from various
sources, patient statistics and clinical data.
“CONCEPTUAL BIOLOGY” AND BIOMEDICAL
ONTOLOGIES
Biological knowledge is evolving from structural
genomics towards functional genomics. The tremendous
amount of DNA sequence information that is now available
provides the foundation for studying how the genome of an
organism is functioning, and microarray technologies
provide detailed information on the mRNA, protein, and
metabolic components of organisms [BOD 03].
At the same time, millions of easily retrievable facts are
being accumulated from a variety of sources in seemingly
unrelated fields, and from thousands of journals.
Biological knowledge is evolving so rapidly that it is
difficult for most scientists to assimilate and integrate the
new information with their existing knowledge.
Beyond Conceptual Biology
Considering the facts above, Blagoskolonny and Perdee
discuss the emergence of “Conceptual Biology” – the
iterative process of analyzing existing facts and models
available in published literature to generate new
hypotheses. They state, “The conceptual review should take
its place as an essential component of scientific research”.
In doing so, new knowledge can be generated by
‘reviewing’ these accumulated results in a concept-driven
manner, linking them into testable chains and networks
[BLA 02].
Disease/Gene Map
Infogene Map is primarily focused at this stage on the
gene-disease relationship. We are representing graphically,
figure 4, these relationships in a way that enables
visualisation and creation of new relationships. We are
using additional properties to define and weight those items
of knowledge acquired from ECOS. This approach enables
us evolve the maps as new knowledge is discovered, by the
use of the data mining techniques available in the Neucom
environment [KED 03].
DISCUSSION
All really big discoveries are the result of thought, in
biology as in any other discipline. Allostery, genes, DNA
structure, chemi-osmosis, immunological memory, ion
channels were all once just an idea [BRA 02].
A knowledge repository that is sharable and capable of
moving the current data collection era into one of
hypothesis –driven research is essential to support new
biomedical discoveries. The conceptual biology and
theoretical biology proposals are start to taken us in this
direction. However, in order to be able to evolve the
ontology map with the huge amount of information
produced daily worldwide, any knowledge repository must
be flexible enough to represent information from diverse
sources of information and in different formats and be able
to represent dynamic relationships.
Modeling these data interactions, learning about them,
extracting knowledge, and building a reusable knowledge
base applying the state of the art of AI and soft-computing
will guide future research and practice and this is in the
core of our research.
CONCLUSION
Although our approach is directed towards many of the
current problems in the ontology integration area, the next
stage is still an open question. Automatic ontology learning
from data is a major challenge for the next development
phase.
Transparent integration between Protégé and Neucom
has to be completed in order to facilitate the knowledge
acquisition by the domain expert him/herself.
The amount of information represented on the Web and
the advance of semantic web will guide our future
implementation. Infogene Map will be translated to
different representation formalisms, such as OWL [OWL
03], to be able to acquire and represent web sources of
information.
The biomedical ontology is currently small but will be
extended to include life style and other patient related
variables as well as including other diseases that are being
investigated in our research group.