25-04-2012, 12:16 PM
Introduction to statistical pattern recognition
stat pattern recognisation.pdf (Size: 201.41 KB / Downloads: 141)
Introduction
This book describes basic pattern recognition procedures, together with practical applications
of the techniques on real-world problems. A strong emphasis is placed on the
statistical theory of discrimination, but clustering also receives some attention. Thus,
the subject matter of this book can be summed up in a single word: ‘classification’,
both supervised (using class information to design a classifier – i.e. discrimination) and
unsupervised (allocating to groups without class information – i.e. clustering).
Pattern recognition as a field of study developed significantly in the 1960s. It was
very much an interdisciplinary subject, covering developments in the areas of statistics,
engineering, artificial intelligence, computer science, psychology and physiology,
among others. Some people entered the field with a real problem to solve.
The basic model
Since many of the techniques we shall describe have been developed over a range of
diverse disciplines, there is naturally a variety of sometimes contradictory terminology.
We shall use the term ‘pattern’ to denote the p-dimensional data vector x D .x1; : : : ; x p/T
of measurements (T denotes vector transpose), whose components xi are measurements of
the features of an object. Thus the features are the variables specified by the investigator
and thought to be important for classification. In discrimination, we assume that there
exist C groups or classes, denoted !1; : : :; !C, and associated with each pattern x is a
categorical variable z that denotes the class or group membership; that is, if z D i, then
the pattern belongs to !i , i 2 f1; : : : ;Cg.
Stages in a pattern recognition problem
A pattern recognition investigation may consist of several stages, enumerated below.
Further details are given in Appendix D. Not all stages may be present; some may be
merged together so that the distinction between two operations may not be clear, even if
both are carried out; also, there may be some application-specific data processing that may
not be regarded as one of the stages listed. However, the points below are fairly typical.
Supervised versus unsupervised
There are two main divisions of classification: supervised classification (or discrimination)
and unsupervised classification (sometimes in the statistics literature simply referred
to as classification or clustering).
In supervised classification we have a set of data samples (each consisting of measurements
on a set of variables) with associated labels, the class types. These are used
as exemplars in the classifier design.
Why do we wish to design an automatic means of classifying future data? Cannot
the same method that was used to label the design set be used on the test data? In
some cases this may be possible.