12-12-2012, 05:41 PM
Application of Genetic Algorithm in Software Testing
Application of Genetic Algorithm in Software Testing.pdf (Size: 235.52 KB / Downloads: 68)
Abstract
This paper presents a method for optimizing software testing efficiency by identifying the
most critical path clusters in a program. We do this by developing variable length Genetic
Algorithms that optimize and select the software path clusters which are weighted in
accordance with the criticality of the path. Exhaustive software testing is rarely possible
because it becomes intractable for even medium sized software. Typically only parts of a
program can be tested, but these parts are not necessarily the most error prone. Therefore,
we are developing a more selective approach to testing by focusing on those parts that are
most critical so that these paths can be tested first. By identifying the most critical paths, the
testing efficiency can be increased.
Keywords: Software Testing, Genetic Algorithm, Test Data
Introduction
The verification and validation of software through dynamic testing is an area of software
engineering where progress towards automation has been slow. In particular the automatic
design and generation of test data remains, by and large, a manual activity. Software testing
remains the primary technique used to gain consumers’ confidence in the software. The
process of testing any software system is an enormous task which is time consuming and
costly [1] [2]. Software testing is laborious and time-consuming work; it spends almost 50%
of software system development resources [1] [2]. Generally, the goal of software testing is to
design a set of minimal number of test cases such that it reveals as many faults as possible. As
mentioned earlier, software testing is a lengthy and time-consuming work [3]. Absolutely, an
automated software testing can significantly reduce the cost of developing software. Other
benefits include: the test preparation can be done in advance, the test runs would be
considerably fast, and the confidence of the testing result can be increased. However,
software testing automation is not a straight forward process. For years, many researchers
have proposed different methods to generate test data automatically, i.e. different methods for
developing test data/case generators [4, 5, 6, 7, 8, 9]. The development of techniques that will
also support the automation of software testing will result in significant cost savings. The
application of artificial intelligence (AI) techniques in Software Engineering (SE) is an
emerging area of research that brings about the cross fertilization of ideas across two
domains. A number of researchers did the work on software testing using artificial
inelegance; they examine the effective use of AI for SE related activities which are inherently
International Journal of Software Engineering and Its Applications
Vol. 3, No.4, October 2009
88
knowledge intensive and human-centered. These issues necessitate the need to investigate the
suitability of search algorithms, e.g. simulated annealing, genetic algorithms, and ant colony
optimization as a better alternative for developing test data generators [4, 5].Using
evolutionary computations, researchers have done some work in developing genetic
algorithms (GA)-based test data generators [6, 7, 8, 9, 10]. A variety of techniques for test
data generation have been developed previously [12, 13, 14, 15, 16] and these can be
categorized as structural and functional testing.
In this paper, we present the results of our research into the application of GA search
approach, to identify the most error prone paths in a software construct. The paper is
structured in the following way: section 2 describe basic structure of genetic algorithm, in
section 3 we discussed our proposed algorithm for test data generator, while section 4
represents the case study of proposed approach using an example and finally in section
5.describe the conclusions part.
Genetic algorithm
A GA [10] starts with guesses and attempts to improve the guesses by evolution. A GA
will typically have five parts: (1) a representation of a guess called a chromosome, (2) an
initial pool of chromosomes, (3) a fitness function, (4) a selection function and (5) a crossover
operator and a mutation operator. A chromosome can be a binary string or a more elaborate
data structure. The initial pool of chromosomes can be randomly produced or manually
created. The fitness function measures the suitability of a chromosome to meet a specified
objective: for coverage based ATG, a chromosome is fitter if it corresponds to greater
coverage. The selection function decides which chromosomes will participate in the evolution
stage of the genetic algorithm made up by the crossover and mutation operators. The
crossover operator exchanges genes from two chromosomes and creates two new
chromosomes. The mutation operator changes a gene in a chromosome and creates one new
chromosome. GA has well-defined steps:
A basic algorithm for a GA is as follows [1]
The pseudo code for GA is:
Initialize (population)
Evaluate (population)
While (stopping condition not satisfied) do
{
Selection (population)
Crossover (population)
Mutate (population)
Evaluate (population)
}
The algorithm will iterate until the population has evolved to form a solution to the problem,
or until a maximum number of iterations have taken place (suggesting that a solution is not
going to be found given the resources available).
3. Proposed Approach
International Journal of Software Engineering and Its Applications
Vol. 3, No.4, October 2009
89
This section describes details of our proposed approach, to test data generation using
GA; more precisely, it describes our fitness function. Our approach uses a weighted
CFG. Path testing searches the program domain for suitable test cases that covers every
possible path in the software under test (SUT). However, it is generally impossible to
achieve this goal, for several reasons. First, a program may contain an infinite number
of paths when the program has loops. Second, the number of paths in a program is
exponential to the number of branches in it and many of them may be unfeasible. Third,
the number of test cases is too large, since each path can be covered by several test
cases. For these reasons, the problem of path testing can become a NP complete
problem making the covering of all possible paths computationally impractical. Since it
is impossible to cover all paths in software, the problem of path testing selects a subset
of paths to execute and find test data to cover it.
Our algorithm works on control flow graph (CFG). CFG is a simple notation for the
representation of control flow. An independent path is any path through the program
that introduces at least one new set of processing statements or a new condition. When
stated in terms of a flow graph an independent path must move along at least edge that
has not been traversed before the path is defined.
Procedure
Input: CFG of the code
Assigning weights to edges of CFG – The first step of algorithm is assigning weights
to CFG. More weights are assigned to edges which are critical so to say, that are part of
paths which are more error prone. An initial credit is taken (100 or 10), if CFG is dense
i.e. large numbers of edges are there than initial credit should be taken as 100 and if
CFG is sparse (small codes) then it can be taken as 10.
At each node of CFG the incoming credit (sum of the weights of all the incoming
edges) is divided and distributed to all the outgoing edges of the node.
Distribution of weights is done as follows:
Take ‘n’ to be the number of outgoing edges.
We have considered an 80-20 rule. 80 percentage of weight of the incoming credit is
given to loops and branches and the remaining 20 percentage of the incoming credit is
given to the edges in sequential path. From each node if n1 is the number of edges in
sequential path and n2 is the number of edges in looping and branching paths, then n1
edges are given 20 percentage of incoming weight and then divided equally amongst
them and the remaining 80 percentage is given to n2 edges.
If there is only one outgoing edge from a particular node than the incoming weight is
assigned to the outgoing edge.
In figure 1 of illustration it can be seen that a similar procedure has been followed to
assign weights to the edges.