DETECTION OF SOFTWARE PLAGIARISM BY PROGRAM DEPENDENCIES GRAPH ANALYSIS

**mkaasees** · 08-09-2016, 10:05 AM

1453712152-PLAGIARISM.docx.pdf (Size: 371.49 KB / Downloads: 4)

INTRODUCTION
Along with the blossom of open source projects comes
the convenience for software plagiarism. Suppose a company
needs to implement a large software product, which,
if done from the scratch, could be timeconsuming. The
developers, if less selfdisciplined, may be tempted to find
a counterpart in some open source projects, rip off the interface
components, like I/O and Graphical User Interfaces
(GUIs), and finally fit the essential components (i.e., core
parts) into their own project with serious disguises. Because
only core parts are plagiarized, which accounts for a
small portion of the whole project, and heavy disguises are
applied, the plagiarism, which we call corepart plagiarism,
is hard to notice. In this paper, we study how to detect
corepart plagiarism both accurately and efficiently. In the
above example, the open source project from which code is
copied is the original program, and the company’s project is
called a plagiarism suspect.

.BACKGROUND
A program dependence graph (PDG) is a graph
representation
of the source code of a procedure [4]. Basic statements,
like variable declarations, assignments, and procedure
calls,
are represented by program vertices in PDGs. Each vertex
has one and only one type, and several important types are
listed in Table 1, which also illustrates how source code is
decomposed and mapped to program vertices. The data
and
control dependencies between statements are represented
by
edges between program vertices in PDGs.
Definition 1
(Control Dependency Edge). There
is a control dependency edge from a “control” vertex to a
second program vertex if the truth of the condition controls
whether the second vertex will be executed.
Definition 2
(Data Dependency Edge). There is a
data dependency edge from program vertex v1 to v2 if
there
is some variable var such that:
• v1 may be assigned to var, either directly or indirectly
through pointers.
• v2 may use the value in var, either directly or indirectly
through pointers.

Plagiarisms disguises
ORIGINAL PROGRAM
A procedure in a program, called join
01 static void
02 make_blank (struct line *blank, int count)
03 {
04 int i;
05 unsigned char *buffer;
06 struct field *fields;
07 blank>nfields = count;
08 blank>buf.size = blank>buf.length = count + 1;
09 blank>buf.buffer = (char*) xmalloc (blank>buf.size);
10 buffer = (unsigned char *) blank>buf.buffer;
11 blank>fields = fields =
(struct field *) xmalloc (sizeof (struct field) * count);
12 for (i = 0; i < count; i++){

Review of Plagiarism Detection
Stringbased: Each statement is treated as a string, and
a program is represented as a sequence of strings.
Two programs are compared to find sequences of same strings [1].
Because blanks and comments are discarded, this kind of algorithms are robust to format
alteration, but fragile to identifier renaming.
ASTbased [Baxter et al. 1998, Kontogiannis et al. 1995]
A program is represented as an Abstract Syntax Tree (AST)
Fragile to statement reordering, control replacement and code insertion
Tokenbased [Kamiya et al. 2002, Prechelt et al. 2002]
Variables of the same type are mapped to the same token
A program is represented as a token string
Fingerprint of token strings is used for robustness [Schleimer et al. 2003]
Partially robust to statement reordering, control replacement and code insertion
Representatives: Moss and JPlag

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Automated exam cell application software	mkaasees	1	1,040	08-01-2018, 10:03 AM Last Post: Raymondnof
	System Analysis (Modeling of the Existing and Proposed System using OOD)	seminar flower	1	2,459	15-09-2017, 03:39 PM Last Post: jaseela123
	DESIGN AND PERFORMANCE ANALYSIS OF OPTICAL CDMA SYSTEM USING NEWLY DESIGNED MULTIWAVE	project girl	1	1,270	15-09-2017, 01:34 PM Last Post: jaseela123
	A Study on Comparative Analysis of Risk and Return with reference to Selected stocks	project maker	1	767	14-09-2017, 10:03 AM Last Post: jaseela123
	Software Requirements Specification for Web Publishing System	study tips	1	972	12-09-2017, 10:34 AM Last Post: jaseela123
	Automated Duplicate Detection for Bug Tracking Systems	seminar flower	1	1,543	07-09-2017, 09:36 AM Last Post: jaseela123
	Analysis of an Anomaly-based Intrusion Detection System for Wireless Sensor Networks	study tips	1	1,049	06-09-2017, 01:16 PM Last Post: jaseela123
	ONLINE OBJECT DETECTION	project topics	1	67,280	06-09-2017, 01:12 PM Last Post: jaseela123
	Content Based Image Retrieval Software Requirement Specifications	seminar surveyer	1	6,061,281	06-09-2017, 10:29 AM Last Post: jaseela123
	Software Defect Association Mining and Defect Correction Effort Prediction	project topics	1	342,050	02-09-2017, 03:21 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.