01-01-2013, 10:34 AM
A Short Course in Data Mining
1A Short Course.pdf (Size: 383.95 KB / Downloads: 19)
What is Data Mining?
The need for Data Mining arises when expensive problems in business (manufacturing, engineering, etc.) have no obvious solutions
Optimizing a manufacturing process or a product formulation.
Detecting fraudulent transactions.
Assessing risk.
Segmenting customers.
A solution must be found.
Pretend problem does not exist. Denial.
Consult local psychic.
Use data mining.
Steps in Data Mining
Stage 0: Precise statement of the problem.
Before opening a software package and running an analysis, the analyst must be clear as to what question he wants to answer. If you have not given a precise formulation of the problem you are trying to solve, then you are wasting time and money.
Stage 1: Initial exploration.
This stage usually starts with data preparation that may involvethe “cleaning”of the data (e.g., identification and removal of incorrectly coded data, etc.), data transformations, selecting subsets of records, and, in the case of data sets withlarge numbers of variables (“fields”), performing preliminary feature selection. Data description and visualization are key components of this stage (e.g. descriptive statistics, correlations, scatterplots, box plots, etc.).
Stage 2: Model building and validation.
This stage involves considering various models and choosing the best one based on their predictive performance.
Stage 3: Deployment.
When the goal of the data mining project is to predict or classify new cases (e.g., to predict the credit worthiness of individuals applying for loans), the third and final stage typically involves the application of the best model or models (determinedin the previous stage) to generate predictions
Model building and validation.
A model is typically rated according to 2 aspects:Accuracy Understandability These aspects sometimes conflict with one another.Decision trees and linear regression models are less complicatedand simpler than models such as neural networks, boosted trees, etc. and thus easier to understand, however, you might be giving up some predictive accuracy.Remember not to confuse the data mining model with reality (a road map is not a perfect representation of the road) but it can be used as a useful guide.