A Comprehensive Survey of Data Mining-based Fraud Detection Research

**seminar ideas** · 21-06-2012, 04:46 PM

A Comprehensive Survey of Data Mining-based Fraud Detection Research

.pdf

A Comprehensive Survey of Data Mining.pdf (Size: 129.25 KB / Downloads: 16)

INTRODUCTION & MOTIVATION

Data mining is about finding insights which are statistically
reliable, unknown previously, and actionable from data (Elkan,
2001). This data must be available, relevant, adequate, and clean.
Also, the data mining problem must be well-defined, cannot be
solved by query and reporting tools, and guided by a data mining
process model (Lavrac et al, 2004).
The term fraud here refers to the abuse of a profit organisation’s
system without necessarily leading to direct legal consequences.
In a competitive environment, fraud can become a business
critical problem if it is very prevalent and if the prevention
procedures are not fail-safe. Fraud detection, being part of the
overall fraud control, automates and helps reduce the manual parts
of a screening/checking process. This area has become one of the
most established industry/government data mining applications.
It is impossible to be absolutely certain about the legitimacy of
and intention behind an application or transaction. Given the
reality, the best cost effective option is to tease out possible
evidences of fraud from the available data using mathematical
algorithms.

Performance Measures

Most fraud departments place monetary value on predictions to
maximise cost savings/profit and according to their policies. They
can either define explicit cost (Phua et al, 2004; Chan et al, 1999;
Fawcett and Provost, 1997) or benefit models (Fan et al, 2004;
Wang et al, 2003).
Cahill et al (2002) suggests giving a score for an instance (phone
call) by determining the similarity of it to known fraud examples
(fraud styles) divided by the dissimilarity of it to known legal
examples (legitimate telecommunications account).

Hybrid Approaches with Labelled Data

Popular supervised algorithms such as neural networks, Bayesian
networks, and decision trees have been combined or applied in a
sequential fashion to improve results. Chan et al (1999) utilises
naive Bayes, C4.5, CART, and RIPPER as base classifiers and
stacking to combine them. They also examine bridging
incompatible data sets from different companies and the pruning
of base classifiers.

Semi-supervised Approaches with Only Legal (Non-fraud) Data ©

Kim et al (2003) implements a novel fraud detection method in
five steps: First, generate rules randomly using association rules
algorithm Apriori and increase diversity by a calender schema;
second, apply rules on known legitimate transaction database,
discard any rule which matches this data; third, use remaining
rules to monitor actual system, discard any rule which detects no
anomalies; fourth, replicate any rule which detects anomalies by
adding tiny random mutations; and fifth, retain the successful
rules. This system has been and currently being tested for internal
fraud by employees within the retail transaction processing
system.

Critique of Methods and Techniques

In most scenarios of real-world fraud detection, the choice of
data mining techniques is more dependent on the practical
issues of operational requirements, resource constraints, and
management commitment towards reduction of fraud than the
technical issues poised by the data.
Other novel commercial fraud detection techniques include
graph-theoretic anomaly detection2 and Inductive Logic
Programming3. There has not been any empirical evaluation of
commercial data mining tools for fraud detection since Abbott
et al (1998).
Only seven studies claim to be implemented (or had been) as
actual fraud detection systems: in insurance (Major and
Riedinger, 2002; Cox, 1995), in credit card (Dorronsoro et al,
1997; Ghosh and Reilly, 1994), and in telecommunications
(Cortes et al, 2003; Cahill et al, 2002; Cox, 1997). Few fraud
detection studies which explicitly utilise temporal information
and virtually none use spatial information.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Ranked, Efficient and Secure Keyword search over encrypted cloud data PPT	seminar post	1	814	21-09-2017, 11:55 AM Last Post: jaseela123
	Data Mining: What is Data Mining? Report	project girl	1	2,262	21-09-2017, 11:47 AM Last Post: jaseela123
	Cyber crime detection, investigation and prosecution pdf	seminar projects maker	1	958	20-09-2017, 11:31 AM Last Post: jaseela123
	DEMONSTRATING DATAPOSSESSION AND UN CHEATABLE DATA TRANSFER	seminar flower	1	1,466	19-09-2017, 11:05 AM Last Post: jaseela123
	MULTIVECTOR PORTABLE INTRUSION DETECTION SYSTEM PPT	seminar flower	1	1,665	15-09-2017, 03:23 PM Last Post: jaseela123
	Processing of collected data PPT	seminar projects maker	1	718	15-09-2017, 12:48 PM Last Post: jaseela123
	A TECHNICAL SEMINOR REPORT ON EYE-MOVEMENT BASED HUMAN-COMPUTER INTERACTION	study tips	1	1,101	14-09-2017, 09:49 AM Last Post: jaseela123
	Assessing Risk in Human Research Protocols	study tips	1	889	13-09-2017, 04:02 PM Last Post: jaseela123
	Detection of Wormhole Attack in MANET	seminar code	1	683	13-09-2017, 01:08 PM Last Post: jaseela123
	Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data pdf	study tips	1	2,018	13-09-2017, 12:59 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.