09-09-2016, 10:12 AM
1453968158-presentation121023055715phpapp02.pptx (Size: 136.66 KB / Downloads: 7)
Data mining (knowledge discovery in databases):
Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases
Potential Applications
Market analysis and management
Risk analysis and management
Fraud detection and management
Text mining (news group, email, documents) and Web analysis
Intelligent query answering
Data Mining Functionalities
Concept description: Characterization and discrimination
Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions
Association (correlation and causality)
Multi-dimensional vs. single-dimensional association
age(X, “20..29”) ^ income(X, “20..29K”) buys(X, “PC”)
contains(T, “computer”) contains(x, “software”)
Issues in Data mining
Individual Privacy
Data Integrity
Relational Database Structure (vs) Multidimensional One
Issue of Cost
Mining methodology and user interaction issues
Performance issues
Issues relating to the diversity of database types
Applications
Database analysis and decision support
Market analysis and management
Target Marketing, Customer Relation Management, Market Basket Analysis, Cross Selling, Market Segmentation
Risk analysis and management
Forecasting, Customer Retention, Improved Underwriting, Quality Control, Competitive Analysis
Applications
Text mining (news group, email, documents) and Web analysis
Intelligent query answering
Sports
Astronomy
Internet Web Surf-Aid
Cluster Analysis
Cluster: a collection of data objects
Similar to one another within the same cluster
Dissimilar to the objects in other clusters
Cluster analysis
Grouping a set of data objects into clusters
Clustering is unsupervised classification: no predefined classes
Typical applications
As a stand-alone tool to get insight into data distribution
As a preprocessing step for other algorithms
What Is Good Clustering
A good clustering method will produce high quality clusters with
high intra-class similarity
low inter-class similarity
The quality of a clustering result depends on both the similarity measure used by the method and its implementation.
The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns
Requirements of Clustering in Data Mining
Scalability
Ability to deal with different types of attributes
Discovery of clusters with arbitrary shape
Minimal requirements for domain knowledge to determine input parameters
Able to deal with noise and outliers
Insensitive to order of input records
High dimensionality
Incorporation of user-specified constraints
Interpretability and Usability