24-09-2014, 03:03 PM
Warehouse CreationA Potential
Roadblock to Data Warehousing
Warehouse Creation.pdf (Size: 371.66 KB / Downloads: 8)
Abstract
—Data warehousing is gaining in popularity as organizations realize the benefits of being able to perform sophisticated
analyses of their data. Recent years have seen the introduction of a number of data-warehousing engines, from both established
database vendors as well as new players. The engines themselves are relatively easy to use and come with a good set of end-user
tools. However, there is one key stumbling block to the rapid development of data warehouses, namely that of warehouse
population. Specifically, problems arise in populating a warehouse with existing data since it has various types of heterogeneity.
Given the lack of good tools, this task has generally been performed by various system integrators, e.g., software consulting
organizations which have developed in-house tools and processes for the task. The general conclusion is that the task has proven
to be labor-intensive, error-prone, and generally frustrating, leading a number of warehousing projects to be abandoned mid-way
through development. However, the picture is not as grim as it appears. The problems that are being encountered in warehouse
creation are very similar to those encountered in data integration, and they have been studied for about two decades. However,
not all problems relevant to warehouse creation have been solved, and a number of research issues remain. The principal goal of
this paper is to identify the common issues in data integration and data-warehouse creation. We hope this will lead: 1) developers
of warehouse creation tools to examine and, where appropriate, incorporate the techniques developed for data integration,
and 2) researchers in both the data integration and the data warehousing communities to address the open research issues in
this important area.
3 ISSUES IN DATA WAREHOUSE CREATION
A number of issues must be addressed in building a data
warehouse. Some are associated with the creation of the
warehouse and others with its operation. In the following,
we discuss the issues affecting warehouse creation.
4 SURVEY OF LITERATURE RELEVANT
TO WAREHOUSE CREATION
In this section, we provide a brief review of the literature
relevant to warehouse creation. Since the main goal of this
paper is to show the commonality between the issues in
warehouse creation and data integration, the focus of this
survey is on the problems proposed in the research domain,
and their solutions. Thus, this survey is not comprehensive
and does not discuss many important issues. For example,
it does not discuss the important issue of tools and process
for handling legacy systems developed by a number of
system integrators, mostly consulting companies. A good
survey of these is provided in chapter 10 of the excellent
book by Brodie and Stonebraker [5]. Other good resources
include [2], [14], [24], [34], [35]. The specific focus of our
survey is on approaches and techniques reported in the
literature for
CONCLUSIONS
In this paper, we make the observation that data warehouse
creation is an important task which is increasingly becom