Data Mining and Clustering Techniques

**seminar ideas** · 30-04-2012, 01:20 PM

Data Mining and Clustering Techniques

.pdf

datamining pdf.PDF (Size: 149.04 KB / Downloads: 89)

Introduction
Data mining, a synonym to “knowledge discovery in databases” is a process of
analyzing data from different perspectives and summarizing it into useful information.
It is a process that allows users to understand the substance of relationships between
data. It reveals patterns and trends that are hidden among the data. It is often viewed
as a process of extracting valid, previously unknown, non-trivial and useful
information from large databases. Data mining systems can be classified according to
the kinds of databases mined, the kinds of knowledge mined, the techniques used or
the applications. Three important components of data mining systems are databases,
data mining engine, and pattern evaluation modules.

Data Mining Techniques
Classification is a most important and frequently used technique in data mining. It is a
process of finding a set of models that describe and distinguish data classes or
concepts. The derived model may be represented in various forms such as
classification (IF-THEN) rules, decision tree, neural networking, etc.
A decision tree is a flowchart like tree structure when each node denotes a test on an
attribute value where each branch represents an outcome of the test, and tree leaves
represent classes. Decision trees can be easily converted to classification rules.

Cluster Analysis
The concept of clustering has been around for a long time. It has several applications,
particularly in the context of information retrieval and in organizing web resources.
The main purpose of clustering is to locate information and in the present day context,
to locate most relevant electronic resources. The research in clustering eventually led
to automatic indexing --- to index as well as to retrieve electronic records. Clustering
is a method in which we make cluster of objects that are some how similar in
characteristics. The ultimate aim of the clustering is to provide a grouping of similar
records. Clustering is often confused with classification, but there is some difference
between the two.

Basic Clustering Step
Preprocessing and feature selection

Most clustering models assume that n-dimensional feature vectors represent all data
items. This step therefore involves choosing an appropriate feature, and doing
appropriate preprocessing and feature extraction on data items to measure the values
of the chosen feature set. It will often be desirable to choose a subset of all the
features available, to reduce the dimensionality of the problem space. This step often
requires a good deal of domain knowledge and data analysis.

Similarity measure
Similarity measure plays an important role in the process of clustering where a set of
objects are grouped into several clusters, so that similar objects will be in the same
cluster and dissimilar ones in different cluster. In clustering, its features represent an
object and the similarity relationship between objects is measured by a similarity
function. This is a function, which takes two sets of data items as input, and returns as
output a similarity measure between them.

Clustering algorithm
Clustering algorithms are general schemes, which use particular similarity measures
as subroutines. The particular choice of clustering algorithms depends on the desired
properties of the final clustering, e.g. what are the relative importance of compactness,
parsimony, and inclusiveness? Other considerations include the usual time and space
complexity. A clustering algorithm attempts to find natural groups of components (or
data) based on some similarity. The clustering algorithm also finds the centroid of a
group of data sets. To determine cluster membership, most algorithms evaluate the
distance between a point and the cluster centroids. The output from a clustering
algorithm is basically a statistical description of the cluster centroids with the number
of components in each cluster (2).

Result validation
Do the results make sense? If not, we may want to iterate back to some prior stage. It
may also be useful to do a test of clustering tendency, to try to guess if clusters are
present at all; note that any clustering algorithm will produce some clusters regardless
of whether or not natural clusters exist.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Design and Analysis Of Algorithms : Seminar Report and PPT	seminar projects maker	1	1,315	21-09-2017, 12:04 PM Last Post: jaseela123
	Ranked, Efficient and Secure Keyword search over encrypted cloud data PPT	seminar post	1	814	21-09-2017, 11:55 AM Last Post: jaseela123
	Data Mining: What is Data Mining? Report	project girl	1	2,262	21-09-2017, 11:47 AM Last Post: jaseela123
	DEMONSTRATING DATAPOSSESSION AND UN CHEATABLE DATA TRANSFER	seminar flower	1	1,466	19-09-2017, 11:05 AM Last Post: jaseela123
	Processing of collected data PPT	seminar projects maker	1	718	15-09-2017, 12:48 PM Last Post: jaseela123
	Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data pdf	study tips	1	2,018	13-09-2017, 12:59 PM Last Post: jaseela123
	INCREMENTAL MINING USING FREQUENT PATTERN TREE	project topics	1	10,061,816	13-09-2017, 09:40 AM Last Post: jaseela123
	Data Warehouse Report	study tips	1	879	12-09-2017, 12:23 PM Last Post: jaseela123
	MOBILE VOTING SYSTEM USING IRIS RECOGNITION AND CRYPTOGRAPHY TECHNIQUES	mkaasees	2	585	09-09-2017, 12:45 PM Last Post: jaseela123
	CONFIDENTIAL DATA STORAGE AND DELETION details	seminar ideas	1	1,668	06-09-2017, 01:23 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.