03-10-2016, 09:35 AM
1457372814-SoftwareDefectPrevention.docx (Size: 95.84 KB / Downloads: 3)
Abstract:
Delivery of reliable software has become a centralissue for the successful software development organizations. Although advancement in the software testing process has ensured better coverage, it is evident that some parts of a software system tend to be more susceptible to defectsthan others, and identification of these can greatly benefit the developers to deliver software of a higher standard. Various techniques can be employed to ensure the quality of software.
Defect Prevention is a vital task in any software project. Defect Prevention involves studying defects that were encountered before and taking specific measures to prevent the future occurrence of such defects. Defect Detection technique is used in many of the software projects to identify the defects, document such defects and analyze them so as to refine the product and make it better. The defect document will be an input for defect analysis. In Defect Analysis technique, analyzing the defect leads to identifying its root cause and to find a solution to the defect thereby preventing it from propagating into further development stages. Defect analysis generally classifies defects into categories and find the possible causes. In Defect Prediction technique, the goal is to anticipate and prevent defects before they can occur. A defect prediction model is built by training a learner using the software metrics. These models can later be used to predict defective classes in a software system. This paper aims to throw light on the Software Defect Prevention techniques and discuss the efficiency of the same.These methods allow early, low cost detection of defects, preventing them from propagating to later development stages and inhibiting the manifestation of similar defects in future projects.
Introduction:
Any software which must be delivered to the consumer should be defect free.Therefore finding out defects and remedies for the same plays an important role in the software development process.The ability to predict defects is also of primary importance in this regard. All these factors mustbe taken into consideration during the software testing phase.Software testing is employed in the software development process to find out any occurrence of defects in the software. Numerous defect analysis and prediction techniques are used, which facilitate in defect prevention.In software defect prediction we predict the occurrence of defects by observing the existing trends and occurrence of defects.There are numerous models which help in predicting the occurrence of defects.Different prediction models are employed in different scenarios. Defect analysis is the process where the already existing defect datais recorded and then sampled out for analysis. A thorough analysis is done with the collected data and the defects are studied in order to improve the overall software quality. Defect prevention is a crucial part because a deliverable software should be free of defects. Through meticulous defect prediction and analysis defect prevention is achieved.
Related Work:
Over the years different processes have been implemented in software defect prediction. We briefly discuss the models of defect prediction studies in the last fifty years.
The first study estimating the number of defects has been conducted in 1971 by Mr.Akiyama. Based on the assumption that complex source code could cause defects, he built a simple model using lines of code (LOC) since it might represent the complexity of software systems. However, LOC is too simple metric to show the complexity of systems. For this, MaCabe and Halstead suggested the cyclomatic complexity metric and Halstead complexity metrics in 1976 and 1977 respectively.
Having told that, the models studied in that period were actually fitting model that investigated the correlation between metrics and the number of defects but not prediction model. These models were not validated on new software modules. To resolve this shortcoming of previous studies, Shen et al. built a linear regression model and tested it on the new program modules. However, Munson et al. maintained that the state of the art regression techniques at that time were not accurate and proposed a classification model that classified modules into two groups, one high risk and other low risk. The classification model actually obtained ninety-two percent of accuracy on their subject system. However, Munson et al.’s study still have several shortcomings such as no metrics for object-oriented (OO) systems and few resources to extract data of development process. As Shen et al. pointed out at that time, it was not possible to collect error fix information which was informally conducted by individual developers in unit testing phases.
In terms of OO systems, Chidamber and Kemerer put forward several object-oriented metrics in 1994 and it was used by Basili et al. to predict defects in object oriented system. In 1990s, version control systems also were getting favored, development history was accumulated into software repositories so that different process metrics were suggested from the middle of 2000s.
In 2000s, there existed a number of limitations for defect prediction. The first limitation was the prediction model could be usable before the product release so that quality could be guaranteed. However, if we can predict defects whenever we change the source code it would be much more helpful. To make this possible they proposed a defect prediction model for changes. Recently, these type of models are called as just in time (JIT) defect prediction models. JIT prediction models have been studied by other researchers over the past few years. The second limitation is that it was burdensome to build a prediction model for new projects or for the projects having less historical data.
As the use of process metrics was getting popular, this limitation became one of the most difficult problems in software defect prediction studies. To rectify this issue, researchers put forward various cross-project defect prediction models. In cross- project defect prediction, detecting cross-prediction was another issue so that Zimmermann et al. and He et al. conducted the study on cross-prediction feasibility. The third limitation was, are the defect prediction models really helpful? Studies have been conducted in this direction, such as case study and proposing practical applications. There have been studies following the trends and patterns of IT as well. By using social network analysis and/or network measures, new metrics were recommended by Zimmermann et al. Privacy issue of defect datasets were also addressed by Peters et al. Recently, new concepts of prediction models have been presented such as personalized defect prediction model and universal model.
1. Defect Prevention Strategy:
How do we go about the process of defect prevention? Ideally, the best approach would be to eliminate the defects altogether, but given the limitations of the current technology, that would be impossible. Defect prevention process should start with defect identification. Defect identification could be done through testing or code inspection. The problems that are identified are logged into a problem database. Deeper analysis of the defects can help us identify the root cause of the defects. This can then help the project team in taking appropriate actions to counter the defects.
Before defect identification, few steps can be taken to ensure avoidance of defects. Firstly, the software requirements should be correctly translated into product specifications. Statistically speaking, errors in software requirements and software design are more frequent than errors in the source code itself. Code reviewing is another essential activity. Review of the code helps reduce the defects such as algorithm implementations, incorrect logic or certain missing conditions. The next step is logging the defects. Defect logging with its correct and detailed description provides a structure so that it can be resolved easily. Next, analyze the defects. Analyzing the defects and their origins can help in root cause analysis. Root cause analysis finds and eliminates the cause, which would prevent recurrence of the defects. Now, how does defect prediction fit into all these? Granted that defect prediction is relatively new domain of research, it is still a valuable technique for defect prevention. Software defect prediction is the process of tracing defective components in software before testing is carried out. Defect prediction technique use models to predict the defect in a software component. These modeling systems can be used to achieve fault predictions for components under development. The development team can use this data to efficiently use their resources to enhance the software.
2. Software defects:
A defect is an undesired element in a software. It is the result of an error in the software development process. It might be anything from an error in the code to any incorrectly specified requirements. It is a discrepancy or imperfection which arises in a software work product (SWP). SWP is a set of methods and transformations adopted by people who develop and maintain the software. A final deliverable software should be defect free. In order to ensure that the software is deliverable, meticulous testing has to be carried out in order to make sure that the software is free from defects or any future defect occurrences. This is a crucial part of the software engineering process. We first predict the defects that are most likely to occur in the software by using a variety of defect prediction techniques and we try to prevent the defects from occurring in the software by defect prevention methodologies. Defect analysis is also carried out in order to identify and know more about the defects which thereby helps in identifying the cause of the defects so that any future occurrences of the defect may be prevented.
3. DefectDetection:
Defect detection is a necessary technique in defect prevention. It is used in software development projects to discover defects and document these defects for improving the software quality. Defects that are detected late in the development process need a lot of rework and testing effort, usually under time and pressure, leading to dubious product quality. Early, low cost detection of defects is required if this is to be avoided.
A defect can originate in any of the phases in a software development project. Irrespective of the developmental approach used, each phase in a project is vulnerable to a defect originating in it. If the defect originates at earlier phases, it is highly probable that it will propagate onto the later phases.
3.1 Purpose
In order to prevent defects from becoming too expensive and risky, activities such as inspections or testing are used. The activity might contribute to validation, which is assuring that the correct system is developed, or to verification, which is assuring that the system meets its specification. The primary objective of both inspection and testing is to find defects. Testing reveals a defect’s presence through its manifestation as a failure during execution. Inspections point directly at the underlying fault.
3.2 Technique
How do we go about detecting a defect? Out of the various techniques available, which one is optimal? As mentioned before, a defect can occur at any phase in a development cycle, and no one technique is supreme over the others for all the phases.
A defect might originate in one development stage and be detected in the same or a later stage. Techniques that can be used to detect a defect are inspection and testing. In testing, there exists structural and functional testing. J.H. van Moll, J. C. Jacobs Philips, B. FreimutFraunhofer , J.J.M. Trienekens, states the importance of using defect detection techniques for each phase in a life cycle model [12]. Thus preventing most of the defects from propagating to the later stages.
3.3 Types of Defects
1. Requirement defects:
These kind of defects occur when incorrect requirements are established. The best way to detect such type of defects are by inspection. Testing can prove to be costly because developing a system on a bad set of requirements and then when it fails, having to re develop it.
2. Design defects:
These defects occur when the system is improperly designed. Various empirical studies [10],[11] have confirmed that inspections were significantly more effective and efficient than testing. A defect detected during design inspection is cheaper to correct than one detected in function testing, because the cost of rework in the latter is significantly higher.
3. Code defects
For these type of defects, functional or structural testing is found to be better than inspection. Some studies claim that that testing and inspection find different kinds of code defects and hence can be used together to complement each other.
3.4 Issues of defect detection:
When components are reused in projects to minimize the development time, the defects in those components, if not rectified using different test conditions, can propagate into further phases, causing the need for more extensive and expensive rework.
When a single project has its components developed by separate teams, any problems related to the component manifest only at the time of, or just before, acceptance of the component by the system project. It is extremely difficult for the overall system project to anticipate on the quality of the component. The other components must be adjusted accordingly if the component is accepted.
4. Defect Analysis:
Defect analysis is a factor which is of primary importance in Software testing. All the existing defect data is recorded and then sampled out for analysis. The identified defects are taken into consideration as data and a thorough analysis is done. Defect analysis generally classifies defects and studies the defects in order to improve the overall quality of the software. The main aim of defect analysis is it aids in easy defect identification and helps in taking the correct measures that help in solving the defects by focusing on the required defect measures to solve a particular defect.
4.1 Root cause analysis:
Root cause analysis is a major form of defect analysis. As the name suggests root cause analysis identifies the primary cause (or) the root cause of the defect and then provides methods to rectify these defects. Using root cause analysis we must be able to predict the defects in the earlier stages of software development rather than finding the defects in the later stages of software development which makes it harder and costlier to rectify the defects. Root cause analysis also facilitates early identification of the defects if it is repetitive. People who have sound knowledge of the occurring defects, their aftermaths and their prevention should be present in order to make sure that the defects are properly analyzed and rectified using efficient techniques.
Root cause analysis can be carried out with the help of various tools, the prominent one being the Ishikawa diagram (also known as Fish bone diagram/Cause and effect diagram) which helps in defect prevention by identifying the root cause for each occurring defect. Root cause analysis helps in employing methods that prevent defects from occurring in the subsequent stages of software development. By doing a thorough study of the defects occurred .It is used to classify causes that contribute to a given situation. All the identified causes are clubbed into categories so that the defect or class of defects can be isolated and thus remedies can be found out in an efficient manner. This throws further light on the causes and gives us an insight into defect behavior and occurrence.
4.2 Pareto Analysis:
Pareto analysis is a technique for defect analysis. It is also called a Pareto Graph or a Pareto chart. It is basically a bar graph with the y-axis representing frequency and the x-axis denoting the defect category. It shows the defect having the highest occurrence and provides an insight into which category defects are recurring in the software. Therefore the defects which are recurring should be given higher priority and treated first rather than the not so frequently occurring defects. The Pareto graph is used when
When we want to find out the recurring problems or category of problems
To assign priority to a problem out of many problems
When looking at a wide range of causes by looking at their individual components.
Communicating with any third party members about the defect data.
Pareto Analysis is easier to understand compared to other techniques and is a lot simpler when it comes to representation.It is also cost efficient. Anything doesn’t come without its limitations. Pareto analysis only uses the past data and scores to predict defects, but sole reliance only on past data is not an advisable method. Pareto analysis does not provide details as to what specific problems need more concentration and improvement. It is not an advisable method to be adopted while taking focused decisions. Also testing of statistical data cannot be carried out by Pareto analysis.
4.3 Static Code Analysis:
Static code analysis is carried out as a part of code review and is generally carried out in the implementation phase.Static code analysis helps in finding out where the defects are likely to be located. It helps upfront detection of defect prone components that are difficult to maintain indicating that more attention should be given to these particular components. The set of components which are to be studied are taken into consideration. The properties of the various components under consideration are measured and a thorough statistical analysis is carried out on the data. After this the identification phase is carried out. The components that exhibit defect characteristics are the ones to be given the utmost priority. Statistical methods and techniques like regression analysis are widely used to predict the extent to which a component might be defect prone and help in easier decision making. This is a very useful technique to find out errors that might otherwise go missing in other techniques. Data flow analysis that detects data flow irregularities is also a method of static analysis.
Improper usage of static analysis tools can cause a lot of problems. Sometimes the static analysis tools produce warnings or messages that the developers in order to resolve these stop paying attention to the final output. The tools take too long to run and the people involved in the development phase do not bother to run them. This is a very costly effective way as the analysis is carried out using software tools.
4.4 Orthogonal Defect Analysis:
Orthogonal Defect Analysis (ODC) is a defect analysis which is widely used. It gathers information from the defect stream of a software to give valuable insights about the defect data in the stream. It reduces the time taken to perform root cause analysis by a factor of 10. ODC converts semantics in the defect stream into a measurable system. The semantics of each software defect is recorded by the ODC technique and analyzing this data provides the required insights and diagnostics for assessing the different phases involved in the life cycle of the software.
In order to perform ODC all the activities performed must be listed out initially. The processes are classified based on the triggered type of defect. Triggers play a crucial part in getting to know what takes place during a test cycle. The organizations define their activities but do not define their triggers. ODC triggers can be obtained from the defect tracking logs which are produced during the defect analysis phase. The classification is carried out based on a variety of factors. Each classification triggers a particular activity or a set of activities for testing. The triggers are then mapped to the respective activities. If the same trigger is showing up in various activities this leads to overlaps between activities. This can be used to concentrate on the activities that are to be carried out. Having a well-defined process is of immense help in carrying out the analysis. However it is not mandatory to have a well-defined process unless we don’t have complete defect information. Otherwise the complete defect information will suffice in order to perform a basic level of analysis. Nevertheless a well-defined process greatly improves the analysis.
4.5 ODC & Root Cause Analysis:
Root cause analysis focusses in detail about each individual defect. It is very useful efficient when we try to find out the main cause behind a defect and the effect which it causes. All this is done perfectly when a single defect is taken into consideration. Therefore performing Root cause analysis on all the defects consumes a lot of time, manpower and resources. Since we perform root cause analysis we tend to identify the root level of the problems and therefore prioritization also becomes an issue to be focused upon. On the other hand ODC performs root cause analysis in a rapid manner with prioritization. The resource consumption is also very minimal compared to root cause analysis.
5. Defect Prediction
Defect Prediction is of primary importance in ensuring the quality and reliability on the software being developed. It is important because early defect prediction reduces the work involved, the complexity of the problem and the cost of solving the problem considerably. There is a huge difference in the amount of time, money and complexity involved while predicting a defect in the earlier phases of the project rather than identifying it after the software has been delivered with the latter proving to be 10 times more costly and complex than the former. The amount of rework to be done to identify and fix defects requires lots of manpower and time. Accuracy of defect prediction techniques is determined by detecting the defective components of a software product without raising any wrong alarms. Increasing false alarm rate wastes the time of developers who inspect the project modules and ensure they are defect free. On the contrary, defining a defective module to be defect free leads to increase in cost and decrease in quality of the software.
Companies usually look forward to minimize their efforts and avoid monotonous strenuous work. Hence defect prediction is of prime importance in software engineering.
5.1. Software Metrics:
A software metric is a standard measure of a degree to which a software system or a process have some property. It is a measure of some characteristic of a software module. The nature of software quality engineering is to look into the relationship between various metrics and final product quality, since software metrics are quantitative methods and have proved so important in prediction of defects.
They can be classified into three categories: product, process and project metrics.
a. Product metrics explains the attributes of a product such as size, design, complexity, performance, features, and quality level.
b. Process metrics can be used to improve software development and maintenance.
c. The project has factors such as the number of developers and their level of skill, the schedule, size, and the structure of an organization definitely affect the quality of the product
5.2. Data Mining and Machine Learning Techniques:
Data Mining along with machine learning techniques can be employed in order to find out the defects in a software product. These techniques are applied on the software repositories and the defects in the product are extracted.
5.2.1 Machine Learning Techniques
Machine learning algorithms are effective for the problem domains which are complex to understand or the problem domains which continuously change with respect to a variety of factors. Some of the most commonly and widely used techniques are decision trees, Bayesian belief networks, Naïve Bayesian Classification and some clustering techniques. Machine learning generates models of the program properties which are most likely to cause errors. These algorithms are used to find the defects by detecting the number of faulty runs while execution. Decision tree learning and support vector machines classify and analyze the subsets of the program properties. This analysis classifies the defect causing conditions into groups. All the faulty properties that are defect causing are used to generate a model and all the properties that have a high probability of causing defects in the software are identified and preventive measures are taken accordingly. Clustering over function call profiles are used to distinguish failures and non-failures. The features are analyzed and the features that enable a model to distinguish failures and non –failures are determined. Invariant detection is also used for defect prediction. Dynamic invariant detection is a technique that detects invariants from a test set and checks if any violations have occurred which indicate a faulty state.
5.2.2 Data Mining
Data Mining is the extraction of interesting patterns of data from a huge amount of data. This is used to observe and find out patterns in huge data sets. The goal of the data mining process is to extract the raw information from a collection of data and convert it into an easily understandable form for further observation and analysis.
Data Mining can be classified into two tasks namely predictive tasks and descriptive tasks. The job of the predictive task is to guess the value of a particular attribute based on the value of other attributes. It takes the values of other attributes into consideration and predicts the value of the particular attribute which is required. The job of the descriptive task is to observe and derive patterns from data. It observes various trends, patterns, similarities and trajectories that summarize the relationships between the data.
.
There are several data mining techniques used in the defect prediction process which are discussed below.
1. Regression: A statistical process to assess relationship among variables. It analyses the relation between the dependent and independent variables. The relation between the dependent and independent variables is shown by an equation that predicts the response variable is a linear function of predictor variable.
Linear Regression: Y=u+a+bX.
2. Association Rule Mining: This method is for discovering links which are interesting between variables in large databases. It is about finding correlations among sets of items in database. It normally deals with finding rules that will predict the happening of item based on the happening of other items.
3. Clustering: A way to categorize a set of items into groups whose members are similar in a way. It is venture of grouping a set of items in a way that intra cluster similarity is maximized and inter cluster similarity is minimized.
4. Classification: Using a given input we predict the outcome of an event. Classification methods make use of input data which is generally referred to as the training set. In the training set all objects are present with known class labels. The main idea of classification is to analyze the training data, learn from it by observing the patterns and build a model. This model is used to differentiate test data for which the class labels are not known. The various classification techniques are as below.
a. Neural Networks: Neural Networks are the nonlinear predictive models which can learn through training and look like biological neural networks in structure. A neural network has interconnected processing elements known as neurons that work together in union within a network to produce output.
b. Decision Trees: A decision tree is a model which is used for prediction and forecasting. Classification models as well as regression models are represented in a tree format using decision trees. It is a tree with decision and leaf nodes. A decision node has greater than or equal to two branches. Leaf nodes specify a decision. It basically acts like a hierarchical classification of decisions and the consequences that the decisions have.
c. Naive Bayes: It is based on Bayes theorem with independence presumption among predictors. Naive Bayes Classifier is based on the hypothesis that the presence or absence of a particular feature of a class in not associated to the presence or absence of any other feature.
d. Support Vector Machines:
Support Vector machines (SVM) are based on decision planes. A decision plane is a plane that separates between objects or a set of objects that have a dissimilar class membership. A decision plane is a plane that defines decision boundaries. SVM constructs a hyper plane in a multidimensional space that separates cases having different class labels. It performs classification by constructing this hyperplane. It supports both regression and classification. SVM supports classification as well as regression techniques.
e. Case Based Reasoning: It implies solving new problems based on the past problems which are similar and using old cases to explain new situations. It works by comparing new unclassified records with known and already existing examples. A simple example of a case based learning algorithm is k-nearest neighboring algorithm. This algorithm stores all the available cases initially. When new cases occur, it classifies news cases based on their similarity measure(i.e. distance function).
5.3. Issues and Problems
Problem with Selecting the Right Set of Metrics
• Studies based on accuracy of defect prediction model focused their interest on either product metrics or project metrics but not the combined impact that product and project metrics have. [1].
It is strongly believed that software size has a relationship with software quality but there is no evidence to show that size metrics is a good indicator of defects [2].
• Defect prediction model did not conclude that either code metrics or change metrics were better for defect removal [3].
• Managers rely on complexity metrics to allocate the quality assurance resources effectively, but complexity metrics fail to predict critical binaries of a complex system [4].
• The effectiveness of Software Science Metrics as defect predictor in object oriented software needs to be established [5]
5.4. Prediction Model using size and complexity metrics
The approach that uses size and complexity metrics is commonly known among the popular models of defect prediction,. This model uses the program code as a basis for prediction of defects. Specifically, lines of code (LOC) are used along with the concept of complexity model developed by McCabe. By using regression equations, simple prediction metrics estimates can be obtained using a dependent variable (D) defined as the sum of defects found during testing and after two months post release. Famously, Akiyama made four equations. We have illustrated the equation that includes the LOC metric:
Defect (D) = 4.86 + 0.018……… Lines of Code (L) -------------------- (1)
Gaffney deduced above equation (1) into another prediction equation. He argued that LOC was not language dependent owing to optimal size for individual modules with regards to defect density. The regression equation is given below:
D = 4.2 + 0.0015 L4/3
The size and complexity models presume that defects are direct function of size or defects that occur due to program complexity. This model ignores the underlying casual effects of programmers and designers. They are the human factors who actually commence the defects, so attribution for any flawed code depends on individual to certain extent. Problem difficulty or poor design capability may result in highly complex programs. Complex problems might require complex solutions and naive programmers might create a spaghetti code [8].
5.5. Machine Learning Based Models
Machine learning (ML) algorithms have established great practical implication in resolving a wide range of engineering problems including the prediction of failure, error, and defect-impulsions as the system software grows to be more composite. Machine Learning algorithms are very useful where human knowledge is imperfect, problem domains are not well defined and forceful adaption for changing condition is needed, in order to develop much efficient algorithms. ML encompasses different types of learning such as artificial neural networks (ANN), concept learning (CL), Bayesian belief networks (BBN), reinforcement learning (RL), genetic algorithms (GA) and genetic programming (GP), instance-based learning (IBL), decision trees (DT), inductive logic programming (ILP), and analytical learning (AL).
G. John, P. Langley [6] employed RF method for forecast of faulty modules with NASA data sets. Prediction of software quality was introduced by Khoshgaftaar et al. [7] by using artificial neural network. In this model they classified modules as fault level or non fault level, using large telecommunication software system. They compared their end results with another non parametric model obtained from discriminant method, Fenton et al. [8] suggested the use of Bayesian belief networks (BBN) for the prediction of faulty software modules. Elish et al. [9] recommended the use of support vector machines for predicting defected modules with context of NASA data sets. This model compares its prediction performance with other statistical and machine learning models. We have discussed few models in detail to enhance the understanding of Machine learning based prediction models.
Conclusion
Implementation of defect preventive action not only helps to give a quality project, but it is also a valuable investment. This paper surveys the various techniques to prevent software defects.The preventive actions that are proposed in this paper are limited to only few types of defects under each category. There may be many other defects that would evolve at each stage. Irrespective of the project life cycle model used, it is recommended to validate and verify each phase of the developmental phase so as to prevent the defect from propagating into later stages which prevents extensive rework.
Through appropriate techniques of detection of defects, apt choice of defect analysis method of the logged defects and defect prediction, which helps in identifying where the defects are likely to occur, ensure early identification and rectification of the defect, thereby preventing the future occurrence of the defect.