16-01-2014, 12:54 PM
A Multiobjective Evolutionary Algorithm for Spam E-mail Filtering
Multiobjective Evolutionary.pdf (Size: 132.95 KB / Downloads: 17)
Abstract
Unsolicited Commercial Email, also known as
spam, has been a major problem on the Internet. In
this paper a well known Multiobjective Evolutionary
Algorithm, NSGA-II, is first time used for spam e-mail
filtering. NSGA-II is adapted to use Genetic
Programming components to achieve a set of filtering
rules with different profiles.
Introduction
The Unsolicited Commercial Email, also known as
spam, is commonplace everywhere in email
communication. Spam is a major and growing problem.
It is estimated that in the month of May 2006, for
example, 86% of all e-mails sent were spam [1]. Spam
is a costly problem and many experts agree it is only
getting worse [2, 3, 4, 5, 6]. Because of the economics
of spam and the difficulties inherent in stopping it, it is
unlikely to go away soon. Consequently, a large
amount of effort has been expended on devising
effective filters to identify spam e-mails.
In recent years, personalized anti-spam filters of
email client applications based on content filters have
now become the standard for spam filters [7, 8]. Spam
filters may be implemented using rule-based filters [9],
nearest neighbor classifiers [10], decision trees [11]
and Bayesian classifiers [12], etc.
The Query Definition Problem
This is the most extended group of applications of
EAs to IR. Every proposal in this group use EAs either
like a relevance feedback technique or like an
Inductive Query by Example (IQBE) algorithm. The
basis of relevance feedback lie in the fact that either
users normally formulate queries composed of terms
that do not match the terms used to index the relevant
documents to their needs, or they do not provide the
appropriate weights for the query terms.
The operation mode involving modifying the
previous query adding and removing terms or
changing the weights of the existing query term
staking into account the relevance judgements of the
documents retrieved by it, constitutes a good way to
solve the latter two problems and to improve the
precision, and especially the recall, of the previous
query [16].
CONCLUSION
In this contribution, a MOEA for spam filtering
purposes has been developed. This MOEA has been
called SPAM-NSGA-II-GP.
It has been tested using a public spam dataset, and
the benefits of using MOEAs in the SF process under
the IQBE parading have been also proved.
SPAM-NSGA-II-GP provides a flexible way to set
a filtering rule profile in a SF system. User can decide
the “level of anti-spam security” it is desirable in
his/her e-mail client. SPAM-NSGA-II-GP would be
continuously learning from the user’s e-mails, getting
new filtering rules.