21-09-2017, 11:47 AM
Data mining is the computational process of discovering patterns in large datasets involving methods at the intersection of auto-learning, statistics, and database systems. It is an interdisciplinary subfield of computer science. The general objective of the data mining process is to extract information from a set of data and transform it into a structure understandable for later use. Apart from the raw analysis step, it involves aspects of data management and databases, data preprocessing, model considerations and inference, interest metrics, complexity considerations, postprocessing of discovered structures, online visualization and updating. Data mining is the step of analyzing the process of "knowledge discovery in databases", or KDD.
The term is an inappropriate term, because the objective is the extraction of patterns and knowledge of large amounts of data, not the extraction (extraction) of data itself. It is also a buzz word and is often applied to any form of large-scale data or information processing (collection, retrieval, storage, analysis and statistics) as well as any application of the computer decision support system, including artificial intelligence, and business intelligence. The book Data Mining: Practical Machine Learning Tools and Techniques with Java (which mainly covers the learning material of the machine) was originally to be named only Machine Learning Practice, and the term data mining was only added by marketing reasons. Often the more general (large-scale) terms of data analysis and analysis - or, when it comes to real methods, artificial intelligence and machine learning - are more appropriate.
The real task of data mining is the semiautomatic or automatic analysis of large amounts of data to extract unknown and interesting patterns, such as clusters of data (cluster analysis), unusual registers (detection of anomalies) and dependencies (mining of rules of association, sequential mining of patterns). This generally involves the use of database techniques, such as spatial indices. These patterns can then be seen as a kind of summary of input data, and can be used in later analysis or, for example, in automatic learning and predictive analysis. For example, the data mining step could identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a decision support system. Neither data collection, data preparation, nor interpretation and presentation of results are part of the data mining step, but belong to the overall KDD process as additional steps.
The terms related to data dredging, data fishing, and data breaching refer to the use of data mining methods to sample parts of a larger population dataset that are (or may be) too small for inferences to be made reliable statistics on the validity of any patterns discovered. However, these methods can be used to create new hypotheses to contrast with the larger data populations.
The term is an inappropriate term, because the objective is the extraction of patterns and knowledge of large amounts of data, not the extraction (extraction) of data itself. It is also a buzz word and is often applied to any form of large-scale data or information processing (collection, retrieval, storage, analysis and statistics) as well as any application of the computer decision support system, including artificial intelligence, and business intelligence. The book Data Mining: Practical Machine Learning Tools and Techniques with Java (which mainly covers the learning material of the machine) was originally to be named only Machine Learning Practice, and the term data mining was only added by marketing reasons. Often the more general (large-scale) terms of data analysis and analysis - or, when it comes to real methods, artificial intelligence and machine learning - are more appropriate.
The real task of data mining is the semiautomatic or automatic analysis of large amounts of data to extract unknown and interesting patterns, such as clusters of data (cluster analysis), unusual registers (detection of anomalies) and dependencies (mining of rules of association, sequential mining of patterns). This generally involves the use of database techniques, such as spatial indices. These patterns can then be seen as a kind of summary of input data, and can be used in later analysis or, for example, in automatic learning and predictive analysis. For example, the data mining step could identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a decision support system. Neither data collection, data preparation, nor interpretation and presentation of results are part of the data mining step, but belong to the overall KDD process as additional steps.
The terms related to data dredging, data fishing, and data breaching refer to the use of data mining methods to sample parts of a larger population dataset that are (or may be) too small for inferences to be made reliable statistics on the validity of any patterns discovered. However, these methods can be used to create new hypotheses to contrast with the larger data populations.