A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification

**seminar ideas** · 15-05-2012, 01:42 PM

A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification

.pdf

SPJ 15 A Fuzzy Self-Constructing Feature Clustering Algo.pdf (Size: 5.18 MB / Downloads: 56)

INTRODUCTION

IN text classification, the dimensionality of the feature
vector is usually huge. For example, 20 Newsgroups [1]
and Reuters21578 top-10 [2], which are two real-world data
sets, both have more than 15,000 features. Such high
dimensionality can be a severe obstacle for classification
algorithms [3], [4]. To alleviate this difficulty, feature
reduction approaches are applied before document classification
tasks are performed [5]. Two major approaches,
feature selection [6], [7], [8], [9], [10] and feature extraction
[11], [12], [13], have been proposed for feature reduction. In
general, feature extraction approaches are more effective
than feature selection techniques, but are more computationally
expensive [11], [12], [14]. Therefore, developing
scalable and efficient feature extraction algorithms is highly
demanded for dealing with high-dimensional document
data sets.

OUR METHOD
There are some issues pertinent to most of the existing
feature clustering methods. First, the parameter k, indicating
the desired number of extracted features, has to be
specified in advance. This gives a burden to the user, since
trial-and-error has to be done until the appropriate number
of extracted features is found. Second, when calculating
similarities, the variance of the underlying cluster is not
considered. Intuitively, the distribution of the data in a
cluster is an important factor in the calculation of similarity.
Third, all words in a cluster have the same degree of
contribution to the resulting extracted feature. Sometimes, it
may be better if more similar words are allowed to have
bigger degrees of contribution. Our feature clustering
algorithm is proposed to deal with these issues.

Self-Constructing Clustering
Our clustering algorithm is an incremental, self-constructing
learning approach. Word patterns are considered one
by one. The user does not need to have any idea about
the number of clusters in advance. No clusters exist at the
beginning, and clusters can be created if necessary. For
each word pattern, the similarity of this word pattern to
each existing cluster is calculated to decide whether it is
combined into an existing cluster or a new cluster is
created. Once a new cluster is created, the corresponding
membership function should be initialized. On the
contrary, when the word pattern is combined into an
existing cluster, the membership function of that cluster
should be updated accordingly.

Text Classification
Given a set D of training documents, text classification can
be done as follows: We specify the similarity threshold for
(16), and apply our clustering algorithm. Assume that
k clusters are obtained for the words in the feature vector
W. Then we find the weighting matrix T and convert D to
D0 by (25). Using D0 as training data, a classifier based on
support vector machines (SVM) is built. Note that any
classifying technique other than SVM can be applied.
Joachims [36] showed that SVM is better than other
methods for text categorization.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Blowfish Encryption Algorithm pdf	project girl	1	1,113	12-09-2017, 12:36 PM Last Post: jaseela123
	Pattern Recognition and (Fuzzy Sets in Pattern Recognition) ppt	project girl	1	918	11-09-2017, 04:19 PM Last Post: jaseela123
	A New Fuzzy Logic Filter for Image Enhancement pdf	study tips	1	1,024	11-09-2017, 01:29 PM Last Post: jaseela123
	Fuzzy Control Model Optimization for Behavior-Constant Traffic	seminar paper	1	2,243	06-09-2017, 02:46 PM Last Post: jaseela123
	A Reduced-Bit Multiplication Algorithm for Digital Arithmetic	seminar flower	1	1,864	31-08-2017, 03:41 PM Last Post: jaseela123
	Clustering PPT	project girl	1	507	30-08-2017, 03:21 PM Last Post: jaseela123
	COMPARISON OF DIFFERENT TYPES OF SCHEDULING ALGORITHM IN WIRE NETWORK	seminar surveyer	1	4,508,586	30-08-2017, 11:02 AM Last Post: jaseela123
	Short Term Load Forecasting with Fuzzy Logic Systems Report	seminar flower	1	963	28-08-2017, 05:00 PM Last Post: jaseela123
	Self-healing systems	seminar surveyer	1	5,539,537	28-08-2017, 04:17 PM Last Post: jaseela123
	Self Organizing Maps	computer science crazy	0	19,656,162	25-08-2017, 09:32 PM Last Post: computer science crazy

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.