More on regression. Gradient descent. Classification

**project girl** · 15-12-2012, 02:56 PM

More on regression. Gradient descent. Classification

.pdf

1More on regression..pdf (Size: 448.46 KB / Downloads: 69)

Recall from last time

• The problem of supervised learning: given data D ! X × Y
find a hypothesis h : X # Y which approximates well the given
data
• Supervised learning algorithms make specific choices about the
hypothesis class, error function used to evaluate the
approximation and algorithm for error minimization
• Linear regression:
– Consider h to be a linear function
– Consider minimizing the mean squared error between h and
the true values on data set D
– Compute the gradient of the MSE and set it to 0
• We obtain a closed-form solution for the parameters

Overfitting

• A general, HUGELY IMPORTANT problem for all machine
learning algorithms
• We can find a hypothesis that predicts perfectly the training data
but does not generalize well to new data
• E.g., a lookup table!
• We are seeing an instance here: if we have a lot of parameters,
the hypothesis ”memorizes” the data points, but is wild
everywhere else.

Overfitting more formally

• Every hypothesis has a ”true” error J"(h) (measured on all
possible data items we could ever encounter)
• Because we do not have all the data, we measure the error on
the training set JD(h)
• Suppose we compare hypotheses h1 and h2 on the training set,
and JD(h1) < JD(h2)
• If h2 is ”truly” better, i.e. J"(h2) < J"(h1), our algorithm is
overfitting.
• We need theoretical and empirical methods to guard against it!

Leave-one-out cross-validation

• How can we choose the best d for an order-d polynomial fit to
the data?
• Repeat the following procedure:
– Leave out one instance from the training set, to estimate the
true prediction error for the best order-d fit for
d & {1, 2, . . . , 9}.
– Use all the other data items for finding w
– Measure the error on the instance left out
– This is an unbiased estimate of the true prediction error
• Choose the d with lowest average prediction error

Cross-validation

• A general procedure for estimating the true error of a predictor
• The data is split into three subsets:
– A training set used only to find the parameters w
– A validation set used to find the right hypothesis class (e.g.
the degree of the polynomial)
– A test set used to report the prediction error of the algorithm
• These set must be disjoint!
• The process is repeated several times, and the results are
averaged to provide error estimates.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Support Vector Machines for Classification in Nonstandard Situations	study tips	0	637	04-03-2013, 11:57 AM Last Post: study tips
	Conjugate gradient method	seminar tips	0	831	26-11-2012, 06:11 PM Last Post: seminar tips
	Correlation and Regression Analysis ppt	seminar flower	0	2,478	06-09-2012, 04:27 PM Last Post: seminar flower
	Random k-Labelsets: An Ensemble Method for Multilabel Classification	seminar flower	0	1,798	04-07-2012, 12:51 PM Last Post: seminar flower

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.