15-02-2013, 11:12 AM
To design a fuzzy similarity based self-constructing feature clustering algorithm
To design a fuzzy.pptx (Size: 155.79 KB / Downloads: 25)
INTRODUCTION
The dimensionality of the feature vector is usually huge.
Two real-world data sets.
20 Newsgroups and
Reuters21578
severe obstacle for classification
Feature reduction
Two major approaches:
Feature selection
Feature extraction
EXISTING SYSTEM
The parameter K, indicating the desired number of extracted features, has to be specified in advance.This gives a burden to the user,since trial-and-error has to be done until the appropriate number of extracted features is found.
When calculating similarities, the variance of the underlying cluster is not considered.Intuitively, the distribution of the data in a cluster is an important factor in the claculation of similarity.
All words in a cluster have the same degree of contribution to the resulting extracted feature.
A FUZZY SELF-CONSTRUCTING FEATURE CLUSTERING ALGORITHM FOR TEXT CLASSIFICATION
A Fuzzy similarity based self-constructing algorithm for feature clustering.
The words in the feature vector of a document set are grouped into clusters,based on the similarity test.
Each cluster is characterized by a membership function with statistical mean and deviation.
The extracted feature corresponding to a cluster, is a weighted combination of the words contained in the cluster.
The derived membership functions match closely with and describe properly the real distribution of the training data.
FEATURE EXTRACTION
Feature extraction can be expressed as D’=DT
word patterns have been grouped into clusters
Three weighting approaches:
1.In the hard-weighting approach, each word is only allowed to belong to a cluster, and so it only contributes to a new extracted feature.
2.In the soft-weighting approach, each word is allowed to contribute to all new extracted features
3.The mixed-weighting approach is a combination of the hard-weighting approach and the soft-weighting approach.