Email Classification Using Data Reduction Method ppt

**seminar flower** · 03-09-2012, 02:18 PM

Email Classification Using Data Reduction Method

.pdf

Email Classification.pdf (Size: 295.38 KB / Downloads: 50)

Abstract

Classifying user emails correctly from penetration of
spam is an important research issue for anti-spam researchers.
This paper has presented an effective and efficient email
classification technique based on data filtering method. In our
testing we have introduced an innovative filtering technique
using instance selection method (ISM) to reduce the pointless
data instances from training model and then classify the test
data. The objective of ISM is to identify which instances
(examples, patterns) in email corpora should be selected as
representatives of the entire dataset, without significant loss of
information. We have used WEKA interface in our integrated
classification model and tested diverse classification algorithms.
Our empirical studies show significant performance in terms of
classification accuracy with reduction of false positive instances.

INTRODUCTION

The Internet is becoming an integral part of our everyday
life and the email has treated a powerful tool intended to be
an idea and information exchange, as well as for users’
commercial and social lives. Due to the increasing volume
of unwanted email called as spam, the users as well as
Internet Service Providers (ISPs) are facing multifarious
problems. Email spam also creates a major threat to the
security of networked systems. Email classification is able
to control the problem in a variety of ways. Detection and
protection of spam emails from the e-mail delivery system
allows end-users to regain a useful means of communication.
Many researches on content based email classification have
been centered on the more sophisticated classifier-related
issues [10]. Currently, machine learning for email
classification is an important research issue. The success of
machine learning techniques in text categorization has led
researchers to explore learning algorithms in spam filtering
[1, 2, 3, 4, 10, 11, 13, and 14]. However, it is amazing that
despite the increasing development of anti-spam services
and technologies, the number of spam messages continues to
increase rapidly.

RELATED WORKS

In recent years, many researchers have turned their
attention to classification of spam using many different
approaches. According to the literature, classification
method is considered one of the standard and commonly
accepted methods to stop spam [10]. This method is
effective for the currently encountered types of spam. The
philosophy behind this method is to separate the spam from
legitimate emails. The classification approaches can be
broadly separated into two different categories. One is
based on non-classification algorithms and other is based on
classification algorithms.

Non-Classification algorithms

Non-classification based methods include heuristic or
rule-based methods, white-listing, black-listing, hash-based
lists and distributed black-lists. Non-classification based
solutions work well because of their simplicity and
relatively short processing time [15]. Another key attraction
is that it does not require a training period. However, in the
context of new filtering technologies and in the light of
current spamming techniques, it has several drawbacks.
Since these methods are based on standard rule sets

CONCLUSION AND FUTURE WORK

This paper presents and effective email classification
technique based on an innovative data filtering technique
into the training model. In our data filtering process, we have
used cluster classifier technique to reduce the insignificant
instances from our training model. After investigation of
different classification algorithms, we have chosen five
classifiers based on our simulation performance and we have
used meta-learning technique (Adaboost) on top of every
classifier. Our empirical performance shows that, we
achieved overall classification accuracy above 97%, which is
significant. In our future work we have a plan to consider the
features from dynamic information from regular incoming
emails and pass to our classification method to achieve better
performance.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	data mining full report	project report tiger	37	374,184,749	16-03-2019, 05:22 PM Last Post: TitkinWY
	A Novel Data Embedding Method Using Adaptive Pixel Pair Matching Report	project girl	3	4,489	15-01-2018, 01:56 PM Last Post: dhanabhagya
	Detecting False Data in Wireless Sensor Network using Efficient Becan Scheme	seminar tips	1	3,235	20-09-2017, 01:03 PM Last Post: jaseela123
	Different Initialization Data and the Performance by the BFM	seminar flower	1	680	20-09-2017, 12:44 PM Last Post: jaseela123
	Color Image Indexing Using BTC	seminar tips	1	1,436	19-09-2017, 02:52 PM Last Post: jaseela123
	Mobile Messenger Using Ad-hoc Networks	seminar code	1	682	19-09-2017, 02:50 PM Last Post: jaseela123
	Wide Area Mobile Data Services	seminar ideas	1	2,373	19-09-2017, 02:35 PM Last Post: jaseela123
	ppt on ONLINE AUCTION	project girl	1	1,881	19-09-2017, 09:49 AM Last Post: jaseela123
	System Analysis (Modeling of the Existing and Proposed System using OOD)	seminar flower	1	2,459	15-09-2017, 03:39 PM Last Post: jaseela123
	Integrating and Designing the Data Mining Technique System Based on Customer	seminar projects maker	1	782	15-09-2017, 02:45 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.