Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Alliance Rules for Data Warehouse Cleansing
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Abstract-Data Cleansing is an activity performed on the
data sets of data warehouse to enhance and maintain the
quality and consistency of the data. This paper addresses
the problems related with dirty data, entrance of dirty
data and detection of dirty data in the data warehouse.
The paper perceives the procedure of data cleansing from
a different perspective. It provides an algorithm for the
detection of errors and dirty data in the data sets of an
already existing data warehouse. The paper characterizes
the alliance rules based on the concept of mathematical
association rules to determine the dirty and faulty data in
data warehouse. The research marks the use of q-grams
[1] to determine the errors in a prominent way.
Keywords- data warehouse; data marts; data cleansing
I. INTRODUCTION
A data warehouse is a complex organization that
stores huge amount of data. Various processes and
procedures are applied for the building and
maintenance of the data warehouse which are highly
sensitive and time variant. An enterprise’s data
warehouse which is being used by the enterprise for
knowledge discovery and trend analysis is maintained
by regular updates, insertions and deletions. This
sensitivity and time variant nature of the operations
hamper the integrity and quality of data residing in the
data warehouse. The data warehouse users characterize
the features of quality of data as coherency, correctness
and accuracy along with the newness and accessibility
of data. The quality of data degrades with these
customary updates which have strong impact on the
processes such as knowledge discovery, data mining
and trend analysis performed on the data warehouse.
Data Warehouse [2] of an enterprise consolidates
the data from multiple sources of the
organization/enterprise in order to support enterprise
wide decision making, reporting, analyzing and
planning. The processes performed on data warehouse
for above mentioned activities are highly sensitive to
quality of data. They depend on the accuracy and
consistency of data. Degraded quality of data leads to
wrong conclusions of these processes which ultimately
lead to wastage of all kinds of resources and assets.
An operation like data mining performed on data
warehouse is basically used in organizations for
Strategic decision making and planning. Data mining
[3] is primarily used today by companies which are
related to retail, financial, communication, and
marketing. It enables these companies to determine
relationships and associations among the variable which
affect the organizational processes.
Processes like data mining are costly and time
consuming. Such processes consume loads of resources
in terms of money, time, human power etc. These
processes are highly critical and demand accurate
inputs to give reliable results. Degraded quality of data
decreases the reliability of the results. Applying these
processes on the low quality and inconsistent data does
not serve any purpose as the results derived cannot be
relied upon for accuracy and reliability. The whole
purpose of performing such special processes on data
warehouse comes to a standstill. This ultimately results
in wastage of all kind of resources and assets.
The only viable and feasible solution for this tedious
problem can be looked as an automated tool specially
designed for the data warehouse to cleanse the corrupt,
faulty and dirty data present in the data warehouse
which lowers the quality of the data. Such automated
data cleansing tool support enhancement and
maintenance of quality of data. Automated data
cleansing tool is the only practical, feasible and costeffective
method for enhancing the quality of data to a
reasonable level. Though, the issue of dirty data in an
existing data warehouse and its solution of automated
data cleansing tool are of high prominence in
organization’s critical and strategic decision making, it
has not been addressed and worked upon up to its
desired rank .