04-12-2012, 06:14 PM
A Tutorial on Principal Component Analysis
A Tutorial on Principal.pdf (Size: 323.65 KB / Downloads: 24)
INTRODUCTION
Principal component analysis (PCA) is a standard tool in modern
data analysis - in diverse fields from neuroscience to computer
graphics - because it is a simple, non-parametric method
for extracting relevant information from confusing data sets.
With minimal effort PCA provides a roadmap for how to reduce
a complex data set to a lower dimension to reveal the
sometimes hidden, simplified structures that often underlie it.
The goal of this tutorial is to provide both an intuitive feel for
PCA, and a thorough discussion of this topic. We will begin
with a simple example and provide an intuitive explanation
of the goal of PCA. We will continue by adding mathematical
rigor to place it within the framework of linear algebra to
provide an explicit solution. We will see how and why PCA
is intimately related to the mathematical technique of singular
value decomposition (SVD). This understanding will lead us
to a prescription for how to apply PCA in the real world and an
appreciation for the underlying assumptions. My hope is that
a thorough understanding of PCA provides a foundation for
approaching the fields of machine learning and dimensional
reduction.
MOTIVATION: A TOY EXAMPLE
Here is the perspective: we are an experimenter. We are trying
to understand some phenomenon by measuring various quantities
(e.g. spectra, voltages, velocities, etc.) in our system.
Unfortunately, we can not figure out what is happening because
the data appears clouded, unclear and even redundant.
This is not a trivial problem, but rather a fundamental obstacle
in empirical science. Examples abound from complex systems
such as neuroscience, web indexing, meteorology and
oceanography - the number of variables to measure can be
unwieldy and at times even deceptive, because the underlying
relationships can often be quite simple.
Take for example a simple toy problem from physics diagrammed
in Figure 1. Pretend we are studying the motion
of the physicist’s ideal spring. This system consists of a ball
of mass m attached to a massless, frictionless spring. The ball
is released a small distance away from equilibrium (i.e. the
spring is stretched). Because the spring is ideal, it oscillates
indefinitely along the x-axis about its equilibrium at a set frequency.
This is a standard problem in physics in which the motion
along the x direction is solved by an explicit function of time.
In other words, the underlying dynamics can be expressed as
a function of a single variable x.
A Naive Basis
With a more precise definition of our goal, we need a more
precise definition of our data as well. We treat every time
sample (or experimental trial) as an individual sample in our
data set. At each time sample we record a set of data consisting
of multiple measurements (e.g. voltage, position, etc.). In
our data set, at one point in time, camera A records a corresponding
ball position (xA,yA). One sample or trial can then
be expressed as a 6 dimensional column vector
Change of Basis
With this rigor we may now state more precisely what PCA
asks: Is there another basis, which is a linear combination of
the original basis, that best re-expresses our data set?
A close reader might have noticed the conspicuous addition of
the word linear. Indeed, PCA makes one stringent but powerful
assumption: linearity. Linearity vastly simplifies the problem
by restricting the set of potential bases. With this assumption
PCA is now limited to re-expressing the data as a linear
combination of its basis vectors.
Let X be the original data set, where each column is a single
sample (or moment in time) of our data set (i.e. ~X). In the toy
example X is an m×n matrix where m = 6 and n = 72000.
Let Y be another m×n matrix related by a linear transformation
P. X is the original recorded data set and Y is a new
representation of that data set.
A MORE GENERAL SOLUTION USING SVD
This section is the most mathematically involved and can be
skipped without much loss of continuity. It is presented solely
for completeness. We derive another algebraic solution for
PCA and in the process, find that PCA is closely related to
singular value decomposition (SVD). In fact, the two are so
intimately related that the names are often used interchangeably.
What we will see though is that SVD is a more general
method of understanding change of basis.
We begin by quickly deriving the decomposition. In the following
section we interpret the decomposition and in the last
section we relate these results to PCA.