13-08-2012, 02:46 PM
Data Mining Primitives, Languages and System Architecture
Data-Mining-2.ppt (Size: 1.23 MB / Downloads: 154)
Introduction
Motivation- need to extract useful information and knowledge from a large amount of data (data explosion problem)
Data Mining tools perform data analysis and may uncover important data patterns, contributing greatly to business strategies, knowledge bases, and scientific and medical research.
What is Data Mining?
Data mining refers to extracting or “mining” knowledge from large amounts of data. Also referred as Knowledge Discovery in Databases.
It is a process of discovering interesting knowledge from large amounts of data stored either in databases, data warehouses, or other information repositories.
Architecture of a typical data mining system
Misconception: Data mining systems can autonomously dig out all of the valuable knowledge from a given large database, without human intervention.
If there was no user intervention then the system would uncover a large set of patterns that may even surpass the size of the database. Hence, user interference is required.
This user communication with the system is provided by using a set of data mining primitives.
Kind of knowledge to be mined
It is important to specify the knowledge to be mined, as this determines the data mining function to be performed.
Kinds of knowledge include concept description, association, classification, prediction and clustering.
User can also provide pattern templates. Also called metapatterns or metarules or metaqueries.
Concept hierarchies (2)
Rolling Up - Generalization of data
Allows to view data at more meaningful and explicit abstractions.
Makes it easier to understand
Compresses the data
Would require fewer input/output operations
Drilling Down - Specialization of data
Concept values replaced by lower level concepts
There may be more than concept hierarchy for a given attribute or dimension based on different user viewpoints
Example:
Regional sales manager may prefer the previous concept hierarchy but marketing manager might prefer to see location with respect to linguistic lines in order to facilitate the distribution of commercial ads.
Presentation and visualization
For data mining to be effective, data mining systems should be able to display the discovered patterns in multiple forms, such as rules, tables, crosstabs (cross-tabulations), pie or bar charts, decision trees, cubes, or other visual representations.
User must be able to specify the forms of presentation to be used for displaying the discovered patterns.
Data mining query languages
Data mining language must be designed to facilitate flexible and effective knowledge discovery.
Having a query language for data mining may help standardize the development of platforms for data mining systems.
But designed a language is challenging because data mining covers a wide spectrum of tasks and each task has different requirement.
Hence, the design of a language requires deep understanding of the limitations and underlying mechanism of the various kinds of tasks.