12-06-2013, 03:08 PM
Data Mining of Medical Data: Opportunities and Challenges in Mining Association Rules
Data Mining of Medical Data.pdf (Size: 234.1 KB / Downloads: 79)
Abstract
Association rules represent knowledge embedded in data sets as proba-
bilistic implications and are intimately related to computation of frequent
item sets. We survey applications of frequent item sets and association
rules in medical practice in such areas as nosocomial infections, adverse
drug reactions, and the interplay between co-morbidities and the lack of
transitivity of association rules.
To make this survey as self-content as possible we present in an ap-
pendix the Fisher exact test and the
2-test, enumeration of subsets, fre-
quent item sets and the Apriori algorithm, and combinatorial properties
of association rules.
Introduction
Data Mining (DM) is the process that discovers new patterns embedded in large
data sets. DM makes use of this information to build predictive models. DM is
grounded in artificial intelligence, databases, and statistics.
The health care industry requires the use of DM because of it generates
huge and complex volumes of data. Thus, un-automated analysis has become
both expensive and impractical. The existence of insurance fraud and abuse impels insurers to use DM. DM can generate information that can be useful
to all stakeholders in health care, including patients by identifying effective
treatments and best practices.
DM came into prominence in mid 90s because computers made possible the
fast construction of huge data warehouses, containing potentially large amounts
of information. The modern day statistical techniques and the advances in
probability theory offered the necessary analytical tools.
The history of data and its contents is much older. Huge collections of data
were built over hundreds and thousands of years by various forms of government
and scientists. A famous case is the vast collection of very accurate planetary
observations of the Danish astronomer Tycho Brahe (Dec. 14, 1546, Knutstorp
Castle - Oct. 24, 1601, Prague). The knowledge embedded in this data - the
laws of the movements of the planets were discovered by his successor Johannes
Kepler (Dec. 27, 1571, Weil der Stadt -Nov. 15, 1630, Regensburg) and were
confirmed by the work of Newton.
Association Rules and Nosocomial Infections
The study of development of drug resistance of bacteria involved in intra-
hospital infections has been pursued in [8, 10] and many other reports.
Among the Gram-negative bacteria which are notorious for their drug resis-
tance, Pseudomonas aeruginosa is a common cause of infections in humans and
its transmission is caused by medical equipment, including catheters.
Association Rules and Adverse Drug Reactions
Adverse drug reactions (ADE) pose a serious problem for the health of the
public and cause wasteful expenses [30]. It is estimated that ADEs account
for 5% of hospital admissions [18], 28% of af emergency department visits [20],
and for 5% of hospital deaths [14]. In US only, ADEs result in losses of several
billion dollars annually.
Due to their impact, ADE are monitored internationally in multiple sites.
The Uppsala Monitoring Center in Sweden, a unit of the World Health Or-
ganization (WHO), mines data originating from individual case safety reports
(ICSRs) and maintains Vigibase, a WHO case safety reporting database. Its
activity started in 1978 and access to Vigibase is allowed for a fee.
At the Food and Drug Administration (FDA), a US federal unit, the AERS
database (Adverse Event Reporting System) is maintained where access is free.
Besides, proprietary ADE databases exists at various pharma entities who, by
US law, must record adverse reactions to drugs.
We discuss the study performed in [11] and the observations of [30] on using
association rules for mining ADE databases.
ADE can involve single or multiple drugs and describe single or multiple
adverse reactions. The simplest association rule describing an ADE has the form
Vioxx ! heart attack and involves one drug and one reactions. Clearly, rule
of this form cannot capture ADE that result from undesirable drug interactions
and this is the focus of [11]. This study is based on a set of 162,744 reports of
suspected ADEs reported to AERS and published in the year 2008. A total of
1167 multi-item ADE associations were identified.
An ADE database has certain unique characteristics that allow for more
efficient mining algorithms. Namely, the set of items is partitioned into two
classes: drugs and symptoms; association rules have the form X −! Y , where
X is a set of drugs and Y is a set of symptoms.