01-03-2013, 12:19 PM
Exploration of Data mining techniques in Fraud Detection: Credit Card
Exploration of Data.pdf (Size: 90.34 KB / Downloads: 243)
Abstract
Data mining has been increasing as one of the chief key features of many security initiatives. Often, used
as a means for detection of fraud, assessing risk as well. Data mining involves the use of data analysis tools to
discover unknown, valid patterns as well as relationships in large data sets. Decades have seen a massive growth in
the use of credit cards as a transactional medium. Data mining become even more common in both the private and
public sectors. Data mining has been used widely in industries such as Banking, Insurance, Medicine and Retailing
to reduce costs, enhance Research and increase Sales. Credit cards are much safer from theft than is cash and also a
promising area for buying and sales. Credit Cards are growing as a popular medium of transaction. Therefore, Fraud
Detection involves monitoring the behavior of users/customers in order to estimate, detect or avoid undesirable
behavior in future. In this paper, we investigated the factors and various techniques involved in credit card fraud
detection during/after transaction as well.
Introduction
Data mining uses Data Analysis tools to discover unknown, hidden and valid patterns as well as relationships in
large data sets. Data mining tools include mathematical algorithms, statistical models, and machine learning
methods such as algorithms which improve performance automatically through learning such as Neural Networks
and Decision Trees. Data Mining consists of collection and management, analysis and prediction of corresponding
data sets. Data mining can be performed on data sets represented in quantitative, textual or multimedia forms. On the
other hand, Data mining applications can use a range of parameters to observe the data. Data mining applications
include association rules, sequence or path analysis, classification methods, clustering and forecasting as well.
Credit Card Fraud (CCF) is a typical task when using normal procedures, so the development of the credit card fraud
detection model has become of significance whether in the academic or business community recently. These models
are mostly statistics-driven or artificial intelligent-based which have the hypothetical advantages in not imposing
random assumptions on the input variables [1]. Timely information on fraudulent activities information is a main
goal and a good strategy for banks and industries as well. As banks have many and huge databases. Then sometimes,
it is difficult to gain access to databases.
Clustering
Among various data mining techniques, Clustering is a data mining technique that makes significant or useful
cluster of object(s) that have similar characteristic using automatic technique. Apart from classification, clustering
technique also defines the classes and put objects in them, although in classification, object(s) are assigned into
predefined classes [4]. For example: In a library, books have ample variety of topics available. Challenging task is
how to keep those books systematically that readers can take numerous books on a particular topic without
disturbance. Hereby, by using clustering technique, we can keep books that have some similarity in one cluster or
in one shelf and label them with a meaningful name. If in case readers want to take books on a topic, he/she would
only go to that shelf instead of looking the complete in the whole library. Clustering is the method by which like
records are grouped (cluster) together. Generally it is accomplished by giving the end user a high level view of what
is going on in the database. Clustering is sometimes used to be alike as segmentation, in which most marketing
people would tell you is more useful for coming up with a birds eye view of the business.
Classification of clustering algorithms
Classification may refer as gathering of different types of clustering algorithms. Clustering algorithms may also vary
based on whether they produce overlapping or non-overlapping clusters. Non-overlapping clusters can be viewed as
Extrinsic Clusters or Intrinsic Clusters.
Extrinsic technique/algorithms categorize the items to support in the classification process. Clustering algorithms are
the traditional classification supervised learning algorithms that uses a special input training set. On the other side,
intrinsic algorithms/techniques do not use ay priori category labels but depend only on the adjacency matrix
containing the distance objects.
Neural Networks
A Neural Network (NN) is a collection of “processing nodes” transferring activity to each other via connections.
Neural Networks (NN) have been successfully applied in a broad range of supervised and unsupervised learning
applications. Neural Network (NN) learning algorithms that are capable to form logical models and that do not
require extreme training times. Neural Networks (NN) topologies/architectures, has been formed by organizing
nodes into layers and associate these layers of neurons with modifiable weighted interconnections.
Neural networks with supervised learning
The Feed-Forward Neural Networks (FFNN) can be used to represent a random non-linear mapping, being provided
that we have data exemplifying mapping as Input-Output (I/O) pairs. Main problem of supervised learning is to
adapt the neural network weights so that the input-mapping corresponds to the Input-Output samples which are
being provided. For estimation, the density of past behavior in batch mode, we should retrieve the data from the last
x days and adapt the mixing proportions to maximize the probability of past behavior. This could be done for each
subscriber individually [10]. Though this approach seems first suited for the work being assigned, this requires too
much interaction with the billing system to be used in practice.