Data Mining is the computer process of discovering patterns in large datasets involving methods at the intersection of auto-learning, statistics, and database systems. It is an interdisciplinary subfield of computer science. The general objective of the data mining process is to extract information from a set of data and transform it into a structure understandable for later use. Apart from the raw analysis step, it involves aspects of data management and databases, data preprocessing, model considerations and inference, interest metrics, complexity considerations, postprocessing of discovered structures, online visualization and updating. Data mining is the step of analyzing the process of "knowledge discovery in databases", or KDD.
The term is an inappropriate term, because the objective is the extraction of patterns and knowledge of large amounts of data, not the extraction (extraction) of data itself. It is also a buzz word and is often applied to any form of large-scale data or information processing (collection, retrieval, storage, analysis and statistics) as well as any application of the computer decision support system, including Artificial intelligence, and business intelligence. The book Data Mining: Practical Machine Learning Tools and Techniques with Java (which mainly covers machine learning material) was originally to be named only Practical Learning Machine, and the term data mining was only added for reasons of marketing. Often the more general (large-scale) terms of data analysis and analysis - or, when it comes to real methods, artificial intelligence and machine learning - are more appropriate.
The real task of data mining is the semi-automatic or automatic analysis of large amounts of data to extract unknown and interesting patterns, such as groups of data records (cluster analysis), unusual records (detection of anomalies) and dependencies (mining of Rules of association, sequential pattern mining). This usually involves the use of database techniques such as spatial indexes. These patterns can then be viewed as a kind of summary of the input data, and can be used in further analysis or, for example, in machine learning and predictive analysis. For example, the data mining step can identify multiple groups in the data, which can be used to obtain more accurate prediction results by a decision support system. Neither data collection, data preparation, nor interpretation and presentation of results are part of the data mining step, but they do belong to the overall KDD process as additional steps.
Terms related to data dredging, data fishing, and data breaching refer to the use of data mining methods to sample parts of a larger set of population data that are (or may be) too small to make inferences Reliable statistics on the validity of any Patterns discovered. However, these methods can be used to create new hypotheses to contrast with the larger data populations.