14-10-2016, 12:02 PM
1458980482-hhhhhhhhhhhhhhhhhhhh.docx (Size: 225.77 KB / Downloads: 5)
Abstract -The field of statistics has drastically changed since the introduction of the computer. Computational statistics is nowadays a very popular field with many new developments of statistical methods and algorithms, and many interesting applications. One challenging problem is the increasing size and complexity of data sets. Not only for saving and filtering such data, but also for analyzing huge data sets new technologies and methods had to be developed. This manuscript is concerned with linear and nonlinear methods for regression and classification. SVM is also a much more rich and demanding problem area. In this paper some major approaches and concepts for that are outlined.
1 Introduction
Linear and Non Linear model analysis is a statistical methodology that utilizes the relation between two or more quantitative variables so that one variable can be predicated from the other, or others. This methodology is widely used in business, the social and behavioural sciences, and the biological sciences. Perhaps the most popular mathematical model for making predictions is the multiple linear regression models. You have already studied multiple regression models in the “Data, Models, and Decisions” course. In this note we will build on this knowledge to examine the use of multiple linear regression models in data mining applications. Multiple linear regressions is applicable to numerous data mining situations. Examples are: predicting customer activity on credit cards from demographics and historical activity patterns, predicting the time to failure of equipment based on utilization and environment conditions, predicting expenditures on vacation travel based on historical frequent flier data, predicting staffing requirements at help desks based on historical data and product and sales information, predicting sales from cross selling of products from historical information and predicting the impact of discounts on sales in retail outlets.
2. Basic of Machine Learning
Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data.
The process of machine learning is similar to that of data mining. Both systems search through data to look for patterns. However, instead of extracting data for human comprehension -- as is the case in data mining applications -- machine learning uses that data to detect patterns in data and adjust program actions accordingly. Machine learning algorithms are often categorized as being supervised or unsupervised. Supervised algorithms can apply what has been learned in the past to new data. Unsupervised algorithms can draw inferences from datasets.
3.1 Linear Model
Linear models describe a continuous response variable as a function of one or more predictor variables. They can help you understand and predict the behavior of complex systems or analyze experimental, financial and biological data.
Logistic Regression
Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more metric (interval or ratio scale) independent variables.[11]
The goal of logistic regression is to find the best fitting (yet biologically reasonable) model to describe the relationship between the dichotomous characteristic of interest (dependent variable = response or outcome variable) and a set of independent (predictor or explanatory) variables. Logistic regression generates the coefficients (and its standard errors and significance levels) of a formula to predict a logit transformation of the probability of presence of the characteristic of interest:
where p is the probability of presence of the characteristic of interest. The logit transformation is defined as the logged odds:
and
Rather than choosing parameters that minimize the sum of squared errors (like in ordinary regression), estimation in logistic regression chooses parameters that maximize the likelihood of observing the sample values.
5 . Perceptron Learning Algorithm
The perceptron learning rule was originally developed by Frank Rosenblatt in the late 1950s. Training patterns are presented to the network's inputs; the output is computed. [3] Then the connection weightswjare modified by an amount that is proportional to the product of
• the difference between the actual output, y, and the desired output, d, and
• the input pattern, x.