05-11-2016, 04:13 PM
1465499222-novelpaper11.doc (Size: 87.5 KB / Downloads: 7)
Abstract— In the case of cost estimation in software engineering and evolution of source code, many researcher have shown that the duplicated code or code clone in software are potentially harmful. Though this is the serious problem in software, through refactoring, there is little bit support for eliminating software clones. A very challenging problem is unification and merging the duplicated code, especially after initial introduction to the software clone they are going through the several modifications in them. This paper presents an approach in which a pair of clone is automatically assessed and without changing the program behavior that clone pair is re-factored safely. The differences present in the clones are examined by this approach and those are safely parameterized without occurrence of any side effect. One of the benefit of this approach is that the negligible computational cost. Finally, a large-scale empirical study has been performed on over a million clone pairs detected and this detection is done by four different clone detection tools. This has been conducted in nine open source projects for investing how re-factorability is affected by different clone properties and tool configuration options
I. INTRODUCTION
In software systems, it has been recognized that the code duplication is serious problem. Code duplication is having bad effect on software system maintenance and evolution [1]-[2]. In last few years different research communities have developed several techniques which were able to detect and analyze the duplicated code [3]. And now more recent research is focused on clone management activities [4], which includes tracing clones in the history of a project, analyzing the consistency of modifications to clones, updating incrementally clone groups as the project evolves, and prioritizing the refactoring of clones. In addition to above development work, the effect of duplicated code on maintenance effort and cost, error-proneness due to inconsistent updates, software defects, change-proneness, and change propagation have been investigated empirically by several researchers. There is a lack of tools which can automatically analyze software clones to determine whether they can be safely re-factored without changing the behavior of program. One of the important but missing features from clone management is re-factorability analysis. When the developers are interested in finding refactoring opportunities for duplicated code it could be used to filter clones that can be directly re-factored. This is the way by which maintainers can focus on parts of the code that can immediately benefit from refactoring, and thus causes improvement in maintainability.
This paper presents an approach that takes two clone fragments as input which are detected from any tool and it applies following three steps to determine whether they can be re-factored without any side effects or not.
Step 1: In this step, this approach finds code fragments with identical nesting structures within those input clones which serve as potential refactoring opportunities. If they are sharing a common nesting structure then consider that two code fragments can be unified, and therefore re-factored.
Step 2: In this step, this approach finds a mapping between the statements of the code fragments that maximizes the number of mapped statements and minimizes the number of differences between the mapped statements by exploring the search space of alternative mapping solutions
Step 3: In the last step, the differences between the mapped statements which were detected in the previous step are examined against a set of preconditions. This is done to determine whether they can be parameterized without changing.
II. RELATED WORK
Following are the two core program structures that are used in this approach :
1. Let set CC represent Cloned Code ={CC1 (C1, filea), CC2 (C2, fileb)….. CCn(Cn,filet)}
2. Let set UC represent code that areUnique Contents= {UC (C1), UC (C2),… UC(Cm)}
Program Structure Tree.
A. Program Dependence Graph.
A. Program Structure Tree
The Program Structure Tree (PST) [5] was introduced by Johnson et al. as a hierarchical representation of program structure and this structure is based on single-entry single-exit (SESE) regions of the control flow graph. The nesting relationship of SESE regions and chains of sequentially composed SESE regions are captured by essentially PST.
B. Program Dependence Graph
The Program Dependence Graph (PDG) [6] is a directed graph which consists of multiple edge types. In PDG the nodes denotes the statements of a function or method, and the edges denotes control and data flow dependencies between statements. In this approach PDG representation is used in two ways. In first way, the composite variables are introduces which represents the state of objects which are referred in body of method and it also creates data dependencies for these variables. In second way, two more types of edges are added in the PDG, which are helpful in the examination of preconditions. These two types of edges are: anti-dependencies and output dependencies.
The system proposed makes it easy to search the code twins with greater scalability.The hybrid model incorporates a server module that indexes the training dataset to its database. The client module includes twin similarity estimation, on input code file. Type I Clone resolution using Hadoop MapReduce Algorithm and techniques such as inverted indexing can effectively solve the issues of the exact clone as seen in Table5.1. The parallel efficiency of MapReduce is impacted by the overheads, such as synchronization and communication costs. The affirmative factor of using MapReduce is scalable and faults tolerant. It shows the best performance as data volume grows.
III. PROBLEM STATEMENT
Though duplication code having large importance in software systems. Many research studies have proven that in the maintenance and evolution of source code, clones can be potentially harmful. Though this is the serious problem in software, through refactoring, there is little bit support for eliminating software clones.
IV. MOTIVATION
Though this is the serious problem in software, through refactoring, there is little bit support for eliminating software clones. A very challenging problem is unification and merging the duplicated code, especially after initial introduction to the software clone they are going through the several modifications in them. Here is an approach which automatically assesses whether a clone pair can be safely refactored or not and that is also without changing the behavior of the program.
V. OBJECTIVE
Here is the first objective which is nothing but to overcome precondition violations by making changes facilitating the successful unification of the clones and developers could be assisted with a more thorough and advanced automated guidance driven by the detected precondition violations.
PROPOSED SYSTEM
Two different forms of input are processed by this approach, and those are :
1) Two code fragments are declared as clones by clone detection tool within the body of the same method, or different methods.
2) Two method declarations considered to be duplicated, or it may contain duplicate code fragments somewhere inside their bodies.
Here are the three major steps for assessing the refactorability in this approach :
1) Nesting Structure Matching: In this step, nesting structure of the input clone fragments is analyzed which is useful in finding maximal isomorphic subtrees. It is assumed that two code fragments can be unified only if they are having an identical nesting structure. Each matched subtree pair will be further investigated as a separate clone refactoring opportunity in the next steps.
2) Statement Mapping: The statements extracted from the previous step within the subtree pairs are mapped in a divide-and-conquer fashion. It takes advantage of the identical nesting structure between the isomorphic subtrees, the global mapping problem is divided into smaller subproblems. The corresponding Program Dependence subgraphs are mapped by applying a Maximum Common Subgraph (MCS) algorithm for each sub-problem. These subsolutions are combined to give the global mapping solution at the end.
3) Precondition Examination: A set of preconditions regarding the preservation of program behavior is examined based on the differences between the mapped statements in the global solution, as well as the statements that may have not been mapped. If no preconditions are violated, then the clone fragments corresponding to the mapped statements can be safely refactored, and thus those are considered to be refactored.
VII. FUTURE SCOPE
With the code clone refactoring we will not only remove the bugs on source code but we achieve the accuracy of code reusability.
VIII. CONCLUSION
Research activity depicted an orderly writing audit that directed to examine the present state of information about clone advancement, and a research goal to detect software clones. It distinguished essential studies as per review convention, and examined issues. To proof that the research performed here with is effective to reveal code clones. The implementation includes coverage of determined clone types and compares multiple code files that reveals in result. It also solves determined ambiguity, hence causing the effective methodology that improves accuracy by mapping dissimilarity between classified clones. As it involves Code Reduction and hybrid machine learning based on supervised and unsupervised learning, system reflects optimum performance and scalability.