21-10-2016, 04:07 PM
1460393696-reportmain.pdf (Size: 3.46 MB / Downloads: 15)
Abstract
This paper presents a novel approach to overcome the difficulty and complexity in detecting
and predicting e-banking phishing website. We proposed an intelligent resilient
and effective model that is based on using association and classification Data Mining
algorithms. These algorithms were used to characterize and identify all the factors and
rules in order to classify the phishing website and the relationship that correlate them
with each other. We implemented six different classification algorithm and techniques
to extract the phishing training data sets criteria to classify their legitimacy. We also
compared their performances, accuracy, number of rules generated and speed. The rules
generated from the associative classification model showed the relationship between some
important characteristics like URL and Domain Identity, and Security and Encryption
criteria in the final phishing detection rate. The experimental results demonstrated the
feasibility of using Associative Classification techniques in real applications and its better
performance as compared to other traditional classifications algorithms.
Introduction
Phishing websites is the fraudulent practice of sending emails purporting to be from
reputable companies in order to induce individuals to reveal personal information, such
as passwords and credit card numbers, online. Typically, the messages appear to come
from well-known and trustworthy Web sites. Web sites that are frequently spoofed by
phishers include PayPal, eBay, MSN, Yahoo, BestBuy, and America Online. [1] A phishing
expedition, like the fishing expedition it’s named for, is a speculative venture: The phisher
puts the lure hoping to fool at least a few of the prey that encounter the bait. The
motivation behind this project is to create a resilient and effective method that uses
Data Mining algorithms and tools to detect e-banking phishing websites in an Artificial
Intelligent technique. [2] Associative and classification algorithms can be very useful in
predicting Phishing websites.
It can give us answers about what are the most important e-banking phishing website
characteristics and indicators and how they relate with each other.Comparison of different
Classification and Association Algorithms used in Data Mining is also the aim of this
1
study.
The motivation behind this study is to create a resilient and effective method that
uses Data Mining algorithms and tools to detect e-banking phishing websites in an Arti-
ficial Intelligent technique. Associative and classification algorithms can be very useful in
predicting Phishing websites. It can give us answers about what are the most important
e-banking phishing website characteristics and indicators and how they relate with each
other. Comparing between different Data Mining Classification and Association methods
and techniques is also a goal of this investigation.
By applying different methods for detecting phishing websites we provide the users
with an interface where they can check if they are dealing with a fraudulent website. Over
the past few years, much attention has been paid to the issue of security and privacy.
Existing literature dealing with the problem of phishing is scarce. We employ a few novel
input features that can assist in discovering phishing attacks with very limited a-prior
knowledge about the adversary or the method used to launch a phishing attack.
A report by Gartner estimated the costs at 1,244 Dollars per victim, an increase over the
257 Dollars they cited in a 2004 report.In 2007, Moore and Clayton estimated the number
of phishing victims by examining web server logs. They estimated that 311,449 people
fall for phishing scams annually, costing around 350 million dollars.
There are several promising defending approaches to this problem reported earlier. One
approach is to stop phishing at the email level, since most current phishing attacks use
broadcast email (spam) to lure victims to a phishing website. [3] Another approach is to
use security toolbars. The phishing filter in IE7 is a toolbar approach with more features
such as blocking the user’s activity with a detected phishing site. A third approach is
to visually differentiate the phishing sites from the spoofed legitimate sites. [4] Dynamic
Security Skins proposes to use a randomly generated visual hash to customize the browser
window or web form elements to indicate the successfully authenticated sites. A fourth
approach is two-factor authentication , which ensures that the user not only knows a
secret but also presents a security token. [5]
Many industrial antiphishing products use toolbars in Web browsers, but some researchers
have shown that security tool bars don’t effectively prevent phishing attacks. Another
approach is to employ certification, e.g., Microsoft spam privacy. [6]A variant of web
credential is to use a database or list published by a trusted party, where known phishing
web sites are blacklisted.
3
Chapter 3
UML Diagrams
3.1 Flow Chart
Flowcharts are a methodology used to analyze, improve, document and manage a
process or program. Flowcharts are helpful for:
1.Aiding understanding of relationships among different process steps
2.Collecting data about a particular process
3.Helping with decision making
4.Measuring the performance of a process
5.Depicting the structure of a process
6.Tracking the process flow
7.Highlighting important steps and eliminating the unnecessary steps.
Use Case
Use cases define interactions between external actors and the system to attain particular
goals. There are three basic elements that make up a use case:
Actors: Actors are the type of users that interact with the system.
System: Use cases capture functional requirements that specify the intended behavior
of the system.
Methodology
Detecting and identifying any phishing websites in real-time,particularly for e-banking,is
really a complex and dynamic problem involving many factors and Criteria. [7]Data Mining
is the process of searching through large amounts of data and picking out relevant
information.It has been described as "the non-trivial extraction of implicit,previously unknown,and
potentially useful information from large datasets.
The approach we use here is to apply data mining algorithms to assess e-banking
phishing website risk on the 27 characteristics and factors which stamp the forged website.We
utilized website data mining classification and association rule approaches in our
new e-banking phishing website detection model to find significant patterns of phishing
characteristics or factors in the e-banking phishing website archive.
Particularly,we used a number of different existing data mining association and classification
techniques including JRip,PART,PRISM and C4.5 algorithms to learn and to
compare the relationships of the different phishing classification features and rules
Conclusion And Future Enhancements
5.0.1 Innovation
The associative classification data mining ebanking phishing website model showed the
significance importance of the phishing website two criteria’s (URL and Domain Identity)
and (Security and Encryption) with insignificant trivial influence of some other criteria
like ’Page Style and content’ and ’Social Human Factor’ in the final phishing rate, which
can help us in building website phishing detection system.
As for future work, we want to use different classification methods like Neural Network
along with Hadoop and Scala,Cassandra classifiers and to experimentally measure and
compare the effect of these different techniques on the final result.