Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Calibrated Probabilistic Predictions for Biomedical Applications
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Abstract—Venn Prediction (VP) is a machine learning framework
that can be used to develop methods that provide wellcalibrated
probabilistic outputs. Unlike other probabilistic methods,
the VP framework guarantees validity under the assumption
that the data are independently and identically distributed (i.i.d.).
Well-calibrated probabilistic outputs are of great importance,
especially in biomedical applications. In this work, we develop
a new Venn Predictor based on the Sequential Minimal Optimisation
(SMO) algorithm and we examine its application to two
real-world biomedical problems. We demonstrate in our results
that our method can provide calibrated probabilistic outputs for
predictions without any loss of accuracy. Moreover, we compare
the outputs of our method with the probability outputs of SMO
with logistic regression.
Index Terms—Venn Prediction, Probability outputs,
biomedicine
I. INTRODUCTION
Recent developments in the biomedical research domain
have given rise to many applications in bio-computing science,
especially in machine learning, where high-dimensional
datasets can be modeled. Nevertheless, most machine learning
methods do not provide probabilistic outputs for their predictions
which are very important for this kind of applications.
There are some methods that output probabilities, but these
can sometimes be misleading and unreliable in cases where the
data assumptions are incorrect, or when the task is difficult.
In this work, we propose the use of Venn Prediction for
producing well-calibrated probabilistic predictions for biomedical
problems. Venn Predictors (VPs) are machine learning
algorithms that can provide reliable probability estimates,
based on the only assumption that the data are independently
and identically distributed (i.i.d.).
The Venn Prediction framework is an extension to the
Conformal Prediction framework which was introduced in [1].
Conformal Predictors (CPs) provide reliable measures of confidence
for their predictions based on the i.i.d. assumption.
Several CPs have been developed based on various algorithms
such as Support Vector Machines [2], Ridge Regression [3],
k-Nearest Neighbours for classification [4] and for regression
[5], Random Forests [6], and Genetic Algorithms [7].
The computational efficiency of CPs has also been greatly
improved using Inductive Conformal Prediction (ICP) [8],
as demonstrated when combined with Ridge Regression [9],
k-Nearest Neighbours [10], and more recently with Neural
Networks [11], [12]. The CP framework has been successfully
applied to medical problems, such as breast cancer diagnosis
[13], classification of leukaemia subtypes [14], early
detection of ovarian cancer [15], and acute abdominal pain
diagnosis [16].
Unlike CPs which provide confidence measures, VPs output
probabilistic intervals for each classification. The Venn Prediction
framework was also introduced in [1] where the interested
reader can find a thorough description. Since then, VPs have
been developed based on k-Nearest Neighbours [17], Nearest
Centroid [18] and Neural Networks [19], [20]. Furthermore, a
VP based on a binary SVM has been developed in [21], where
it has been compared with Platt’s method in the batch setting.
An Inductive Venn Predictor (IVP) has also been introduced
in [22]. In this work, we develop a VP based on the Sequential
Minimal Optimisation (SMO) algorithm, and we apply our
method on two real-world biomedical datasets: The Ecoli
and Dermatology datasets, both available at the University of
California, Irvine (UCI) Machine Learning Repository [28].
We conduct experiments with the SMO algorithm, SMO with
logistic regression that outputs probability estimates, and our
VP. We compare the results of all the algorithms, and we
demonstrate the reliability of the probability estimates that are
given by our method.