17-06-2014, 03:49 PM
Oracle Data Mining
Oracle Data Mining.docx (Size: 44.91 KB / Downloads: 10)
This chapter introduces the basics you will need to start using Oracle Data Mining.
This chapter includes the following sections:
• Data Mining in the Database Kernel
• Data Mining Functions
• Data Mining Algorithms
• Data Preparation
• How Do I Use Oracle Data Mining?
• Where Do I Find Information About Oracle Data Mining?
• Oracle Data Mining and Oracle Database Analytics
Data Mining in the Database Kernel
Oracle Data Mining provides comprehensive, state-of-the-art data mining functionality within Oracle Database.
Oracle Data Mining is implemented in the Oracle Database kernel, and mining models are first class database objects. Oracle Data Mining processes use built-in features of Oracle Database to maximize scalability and make efficient use of system resources.
Data mining within Oracle Database offers many advantages:
• No Data Movement. Some data mining products require that the data be exported from a corporate database and converted to a specialized format for mining. With Oracle Data Mining, no data movement or conversion is needed. This makes the entire mining process less complex, time-consuming, and error-prone.
• Security. Your data is protected by the extensive security mechanisms of Oracle Database. Moreover, specific database privileges are needed for different data mining activities. Only users with the appropriate privileges can score (apply) mining models.
• Data Preparation and Administration. Most data must be cleansed, filtered, normalized, sampled, and transformed in various ways before it can be mined. Up to 80% of the effort in a data mining project is often devoted to data preparation. Oracle Data Mining can automatically manage key steps in the data preparation process. Additionally, Oracle Database provides extensive administrative tools for preparing and managing data.
• Ease of Data Refresh. Mining processes within Oracle Database have ready access to refreshed data. Oracle Data Mining can easily deliver mining results based on current data, thereby maximizing its timeliness and relevance.
Data Mining Functions
A basic understanding of data mining functions and algorithms is required for using Oracle Data Mining. This section introduces the concept of data mining functions. Algorithms are introduced in "Data Mining Algorithms".
Each data mining function specifies a class of problems that can be modeled and solved. Data mining functions fall generally into two categories: supervisedand unsupervised. Notions of supervised and unsupervised learning are derived from the science of machine learning, which has been called a sub-area ofartificial intelligence.
Artificial intelligence refers to the implementation and study of systems that exhibit autonomous intelligence or behavior of their own. Machine learning deals with techniques that enable devices to learn from their own performance and modify their own functioning. Data mining applies machine learning concepts to data.
See Also:
Part II, "Mining Functions" for more details about data mining functions
Supervised Data Mining
Supervised learning is also known as directed learning. The learning process is directed by a previously known dependent attribute or target. Directed data mining attempts to explain the behavior of the target as a function of a set of independent attributes or predictors.
Supervised learning generally results in predictive models. This is in contrast to unsupervised learning where the goal is pattern detection.
The building of a supervised model involves training, a process whereby the software analyzes many cases where the target value is already known. In the training process, the model "learns" the logic for making the prediction. For example, a model that seeks to identify the customers who are likely to respond to a promotion must be trained by analyzing the characteristics of many customers who are known to have responded or not responded to a promotion in the past.
Supervised Learning: Testing
Separate data sets are required for building (training) and testing some predictive models. The build data (training data) and test data must have the same column structure. Typically, one large table or view is split into two data sets: one for building the model, and the other for testing the model.
The process of applying the model to test data helps to determine whether the model, built on one chosen sample, is generalizable to other data. In particular, it helps to avoid the phenomenon of overfitting, which can occur when the logic of the model fits the build data too well and therefore has little predictive power.
Data Mining Algorithms
An algorithm is a mathematical procedure for solving a specific kind of problem. Oracle Data Mining supports at least one algorithm for each data mining function. For some functions, you can choose among several algorithms. For example, Oracle Data Mining supports four classification algorithms.
Each data mining model is produced by a specific algorithm. Some data mining problems can best be solved by using more than one algorithm. This necessitates the development of more than one model. For example, you might first use a feature extraction model to create an optimized set of predictors, then a classification model to make a prediction on the results.
Note:
You can be successful at data mining without understanding the inner workings of each algorithm. However, it is important to understand the general characteristics of the algorithms and their suitability for different kinds of applications.
data mining process
Data mining is an iterative process that typically involves the following phases:
Problem definition
A data mining project starts with the understanding of the business problem. Data mining experts, business experts, and domain experts work closely together to define the project objectives and the requirements from a business perspective. The project objective is then translated into a data mining problem definition.
In the problem definition phase, data mining tools are not yet required.
Data exploration
Domain experts understand the meaning of the metadata. They collect, describe, and explore the data. They also identify quality problems of the data. A frequent exchange with the data mining experts and the business experts from the problem definition phase is vital.
In the data exploration phase, traditional data analysis tools, for example, statistics, are used to explore the data.
Data preparation
Domain experts build the data model for the modeling process. They collect, cleanse, and format the data because some of the mining functions accept data only in a certain format. They also create new derived attributes, for example, an average value.
In the data preparation phase, data is tweaked multiple times in no prescribed order. Preparing the data for the modeling tool by selecting tables, records, and attributes, are typical tasks in this phase. The meaning of the data is not changed.
Modeling
Data mining experts select and apply various mining functions because you can use different mining functions for the same type of data mining problem. Some of the mining functions require specific data types. The data mining experts must assess each model.
In the modeling phase, a frequent exchange with the domain experts from the data preparation phase is required.
The modeling phase and the evaluation phase are coupled. They can be repeated several times to change parameters until optimal values are achieved. When the final modeling phase is completed, a model of high quality has been built.
Evaluation
Data mining experts evaluate the model. If the model does not satisfy their expectations, they go back to the modeling phase and rebuild the model by changing its parameters until optimal values are achieved. When they are finally satisfied with the model, they can extract business explanations and evaluate the following questions:
• Does the model achieve the business objective?
• Have all business issues been considered?