Theoretical Bioinformatics and Machine Learning pdf

**project girl** · 18-01-2013, 01:48 PM

Theoretical Bioinformatics and Machine Learning

Introduction

This course is part of the curriculum of the master of science in bioinformatics at the Johannes
Kepler University Linz. Machine learning has a major application in biology and medicine and
many fields of research in bioinformatics are based on machine learning. For example one of the
most prominent bioinformatics textbooks “Bioinformatics: The Machine Learning Approach” by
P. Baldi and S. Brunak (MIT Press, ISBN 0-262-02506-X) sees the foundation of bioinformatics
in machine learning.
Machine learning methods, for example neural networks used for the secondary and 3D structure
prediction of proteins, have proven their value as essential bioinformatics tools. Modern measurement
techniques in both biology and medicine create a huge demand for new machine learning
approaches. One such technique is the measurement of mRNA concentrations with microarrays,
where the data is first preprocessed, then genes of interest are identified, and finally predictions
made. In other examples DNA data is integrated with other complementary measurements in order
to detect alternative splicing, nucleosome positions, gene regulation, etc. All of these tasks are performed
by machine learning algorithms. Alongside neural networks the most prominent machine
learning techniques relate to support vector machines, kernel approaches, projection method and
belief networks. These methods provide noise reduction, feature selection, structure extraction,
classification / regression, and assist modeling. In the biomedical context, machine learning algorithms
predict cancer treatment outcomes based on gene expression profiles, they classify novel
protein sequences into structural or functional classes and extract new dependencies between DNA
markers (SNP - single nucleotide polymorphisms) and diseases (schizophrenia or alcohol dependence).
In this course the most prominent machine learning techniques are introduced and their mathematical
foundations are shown. However, because of the restricted space neither mathematical or
practical details are presented. Only few selected applications of machine learning in biology and
medicine are given as the focus is on the understanding of the machine learning techniques. If the
techniques are well understood then new applications will arise, old ones can be improved, and
the methods which best fit to the problem can be selected.

Basics of Machine Learning

The conventional approach to solve problems with the help of computers is to write programs
which solve the problem. In this approach the programmer must understand the problem, find
a solution appropriate for the computer, and implement this solution on the computer. We call
this approach deductive because the human deduces the solution from the problem formulation.
However in biology, chemistry, biophysics, medicine, and other life science fields a huge amount
of data is produced which is hard to understand and to interpret by humans. A solution to a
problem may also be found by a machine which learns. Such a machine processes the data and
automatically finds structures in the data, i.e. learns. The knowledge about the extracted structure
can be used to solve the problem at hand. We call this approach inductive, Machine learning is
about inductively solving problems by machines, i.e. computers.
Researchers in machine learning construct algorithms that automatically improve a solution
a problem with more data. In general the quality of the solution increases with the amount of
problem-relevant data which is available.
Problems solved by machine learning methods range from classifying observations, predicting
values, structuring data (e.g. clustering), compressing data, visualizing data, filtering data, selecting
relevant components from data, extracting dependencies between data components, modeling
the data generating systems, constructing noise models for the observed data, integrating data from
different sensors,

Introductory Example

In the following we will consider a classification problem taken from “Pattern Classification”,
Duda, Hart, and Stork, 2001, JohnWiley & Sons, Inc. In this classification problem salmons must
be distinguished from sea bass given pictures of the fishes. Goal is that an automated system is
able to separate the fishes in a fish-packing company, where salmons and sea bass are sold. We
are given a set of pictures where experts told whether the fish on the picture is salmon or sea
bass. This set, called training set, can be used to construct the automated system. The objective
is that future pictures of fishes can be used to automatically separate salmon from sea bass, i.e. to
classify the fishes. Therefore, the goal is to correctly classify the fishes in the future on unseen
data. The performance on future novel data is called generalization. Thus, our goal is to maximize
the generalization performance.

Supervised and Unsupervised Learning

In previous example a human expert characterized the data, i.e. supplied the label (the class).
Tasks, where the desired output for each object is given, are called supervised and the desired
outputs are called targets. This term stems from the fact that during learning a model can obtain
the correct value from the teacher, the supervisor.
If data has to be processed by machine learning methods, where the desired output is not given,
then the learning task is called unsupervised. In supervised task one can immediately measure
how good the model performs on the training data, because the optimal outputs, the targets.

Reinforcement Learning

There are machine learningmethods which do not fit into the unsupervised/supervised classification.
For example, with reinforcement learning the model has to produce a sequence of outputs
based on inputs but only receives a signal, a reward or a penalty, at sequence end or during the sequence.
Each output influences the world in which the model, the actor, is located.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Design and development of semi-automatic cutting machine for young coconuts	mkaasees	0	153	03-09-2016, 12:22 PM Last Post: mkaasees
	Design And Development Of Process Chemistry For Active Pharmaceutical Ingredients And	mkaasees	0	543	25-08-2014, 12:05 PM Last Post: mkaasees
	Microcontroller Based Weighing Machine	compu	0	418	18-08-2014, 03:32 PM Last Post: compu
	Betel Nut Grinding Machine	compu	0	351	18-08-2014, 03:30 PM Last Post: compu
	RFID Navigation System for the Visually Impaired pdf	study tips	0	645	22-02-2013, 04:12 PM Last Post: study tips
	Dynamometer Basics pdf	project girl	0	1,129	31-01-2013, 10:24 AM Last Post: project girl
	Significance of Nanotechnology in Construction Engineering pdf	project girl	0	1,401	22-01-2013, 12:52 PM Last Post: project girl
	AGRICULTURAL WASTES USED AS CASING MIXTURES FOR PRODUCTION OF BUTTON MUSHROOM pdf	project girl	0	992	18-01-2013, 01:01 PM Last Post: project girl
	Standard Format for Preparing the Synopsis pdf	project girl	0	851	17-01-2013, 03:17 PM Last Post: project girl
	GPS TECHNOLOGY TO AID THE BLIND AND PARTIALLY SIGHTED IN COPENHAGEN, DENMARK pdf	project girl	0	786	16-01-2013, 11:15 AM Last Post: project girl

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.