03-12-2012, 05:43 PM
DATA QUALITY MINING USING GENETIC ALGORITHM
DATA QUALITY MINING.pptx (Size: 73.9 KB / Downloads: 29)
ABSTRACT
Data Quality Mining (DQM) is a new data mining approach from the business point of view.
People use information attribute as a tool for accessing data quality.
The goal of DQM is to employ data mining methods in order to detect, quantify, explain and correct data qualify deficiencies in very large databases.
GENETIC ALGORITHM
GA process is an iteration manner by generating new populations of strings from old ones.
Standard GA apply genetic operators such selection, crossover and mutation on an initially random population in order to compute a whole generation of new strings.
Selection deals with the probabilistic survival of the fittest, in that more fit chromosomes are chosen to survive. Where fitness is a comparable measure of how well a chromosome solves the problem at hand.
Crossover takes individual chromosomes from population combines them to form new ones.
Mutation alters the new solutions so as to add stochasticity in the search for better solutions.
METHODOLOGY
Steps in methodology
1. Load a sample of records from the database that fits in the memory.
2. Generate N chromosomes randomly.
3. Decode them to get the values of the different attributes.
4. Scan the loaded sample to find the support of antecedent part, consequent part and the rule.
5. Find the confidence, comprehensibility, completeness and interestingness values.
6. Rank the chromosomes depending on the non-dominance property.
7. Assign fitness to the chromosomes using the ranks, as mentioned earlier.
CONCLUSION
In this present work, we have used a Pareto based genetic algorithm to solve the multi-objective rule mining problem using four measures––completeness, comprehensibility, interestingness and the predictive accuracy
This approach may not work properly in the given dataset and it is not homogeneous as this is applied on a sample of dataset.