28-03-2012, 11:53 AM
Data Mining
Data Mining.ppt (Size: 50.5 KB / Downloads: 195)
New buzzword, old idea.
Inferring new information from already collected data.
Traditionally job of Data Analysts
Computers have changed this. Far more efficient to comb through data using a machine than eyeballing statistical data.
Data Mining – Two Main Components
Wikipedia definition: “Data mining is the entire process of applying computer-based methodology, including new techniques for knowledge discovery, from data.”
Knowledge Discovery Concrete information gleaned from known data. Data you may not have known, but which is supported by recorded facts. (ie: Diapers and beer example from previous presentation)
Knowledge PredictionUses known data to forecast future trends, events, etc. (ie: Stock market predictions)
Wikipedia note: "some data mining systems such as neural networks are inherently geared towards prediction and pattern recognition, rather than knowledge discovery.“ These include applications in AI and Symbol analysis
Data Mining Subtypes
Data DredgingThe process of scanning a data set for relations and then coming up with a hypothesis for existence of those relations.
MetaData Data that describes other data. Can describe an individual element, or a collection of elements. Wikipedia example: “In a library, where the data is the content of the titles stocked, metadata about a title would typically include a description of the content, the author, the publication date and the physical location”
Applications for Data Dredging in business include Market and Risk Analysis, as well as trading strategies.
Applications for Science include disaster prediction.
Propositional vs. Relational Data
Old data mining methods relied on Propositional Data, or data that was related to a single, central element, that could be represented in a vector format. (ie: the purchasing history of a single user. Amazon uses such vectors in its related item suggestions [a multidimensional dot product])
Current, advanced data mining methods rely on Relational Data, or data that can be stored and modeled easily through use of relational databases. An example of this would be data used to represent interpersonal relations.
Relational Data is more interesting than Propositional data to miners in the sense that an entity, and all the entities to which it is related, factor into the data inference process.
Uses of Data Mining
Health and ScienceProtein FoldingPredicting protein interactions and functionality within biological cells. Applications of this research include determining causes and possible cures for Alzheimers, Parkinson's, and some cancers (caused by protein "misfolds")Extra-Terrestrial IntelligenceScanning Satellite receptions for possible transmissions from other planets.
For more information see Stanford’s Folding@home and SETI@home projects. Both involve participation in a widely distributed computer application.