Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: A Weighted Genetic Algorithm Based Method for Clustering of Heteroscaled Datasets
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Abstract—This paper introduces a weighted genetic algorithm
(GA) based clustering method for datasets with differently
scaled dimensions. Several types of synthetic two dimensional
scatter data were clustered using the typical k-means clustering
method. The weighted GA-based clustering method was
developed to address the problem of clustering data with
differently scaled (heteroscaled) dimensions. Cluster analysis
results obtained from using this method was compared to the
results produced from the application of the traditional kmeans
clustering. By introducing weights in the fitness
evaluation component of the meta-heuristic search method, a
more efficient clustering of heteroscaled data was produced. In
real applications, this method can be used in cluster analyses of
scatter data with significantly different scales in dimensions,
such as kurtosis versus fatigue damage relationship scatter
data.
Keywords - genetic algorithm; data clustering; heteroscaled
data set; cluster analysis; k-means clustering; scattered data set
I. INTRODUCTION
In the field of evolutionary computing, the evolutionary
principles of survival of the fittest, natural selection and
genetic inheritance are abstracted and modeled into
algorithms that search for optimal solutions to a problem.
The most popular technique in evolutionary computing
research has been the genetic algorithm [1, 2, 3, 8]. Genetic
algorithms (GA) perform meta-heuristic search in complex,
large, and multimodal landscapes, and provide near-optimal
solutions for objective or fitness functions of optimization
problems[4, 8]. GAs and GA-based techniques have been
used in fields such as industrial engineering [1] and in
optimizing the performance of neural networks, fuzzy
systems, production systems, wireless systems and other
program structures [2].
II. GENETIC ALGORITHM
Most GA methods have at least the following in
common: populations of chromosomes, selection according
to fitness, crossover to produce new offspring, and random
mutation of new offspring [2]. Solutions in GA are encoded
as chromosomes which are strings of numbers or characters
that represent the values or parameters of the solution to the
problem. The chromosomes are commonly encoded as
strings of binary, real-valued, integer, octal, or hexadecimal
numbers [1]. Each of these types of numbers has their own
advantages and disadvantages when used for certain data
types or for searching for solutions to certain problems. In
this study, real-valued numbers string was selected as the
chromosome encoding for the population of potential
solutions.
The set of potential solutions to the problem is
represented as a population of chromosomes. Initially, a
random population is created, which represents different
points in the search space of potential solutions [4, 10, 11]. A
fitness function assigns a score (fitness) to each chromosome
in the current population, which will determine its survival
into the next generation. The fitness of a chromosome
depends on how well that chromosome can solve the
problem at hand [10].
The selection of chromosomes is done on the current
population based on the fitness values – chromosomes with
higher fitness are more likely to be selected than those with
low fitness values. This is mostly done using probabilistic
methods; in evolutionary computing researches, the common
methods of selection are the roulette wheel, tournament, and
rank selection [1, 2, 4]. Selected chromosomes are then
included in the next generation of population.
Next the population undergoes the crossover (also called
recombination) genetic operator, which selects chromosomes
from the population to produce offsprings. Using random
selection or any of the previously mentioned selection
methods, two parent chromosomes are chosen for crossover
operation. Using single-point, two-point, or N-point
crossover, parts of the gene string in each parent
chromosome are swapped to produce two new offspring,
which are included in the next generation of population. The
process is repeated a number of times, usually according to
some user-specified proportional value of the current
population.