22-05-2012, 10:49 AM
DATA WAREHOUSE PROCESS MANAGEMENT
DATA WAREHOUSE PROCESS MANAGEMENT.pdf (Size: 330.38 KB / Downloads: 38)
Introduction
Data Warehouses (DW) integrate data from multiple heterogeneous information sources and
transform them into a multidimensional representation for decision support applications. Apart from a
complex architecture, involving data sources, the data staging area, operational data stores, the global
data warehouse, the client data marts, etc., a data warehouse is also characterized by a complex
lifecycle. In a permanent design phase, the designer has to produce and maintain a conceptual model
and a usually voluminous logical schema, accompanied by a detailed physical design for efficiency
reasons. The designer must also deal with data warehouse administrative processes, which are complex
in structure, large in number and hard to code; deadlines must be met for the population of the data
warehouse and contingency actions taken in the case of errors. Finally, the evolution phase involves a
combination of design and administration tasks: as time passes, the business rules of an organization
change, new data are requested by the end users, new sources of information become available, and the
data warehouse architecture must evolve to efficiently support the decision-making process within the
organization that owns the data warehouse.
Background and Motivation
In this section we will detail the background work and the motivation behind the proposed
metamodel for data warehouse operational processes.
Background Work for the Metamodel: The Quest for Formal Models of Quality
For decision support, a data warehouse must provide high quality of data and service. Errors in
databases have been reported to be up to ten percent range and even higher in a variety of applications.
[65] report that more than $2 billion of U.S. federal loan money had been lost because of poor data
quality at a single agency; manufacturing companies spend over 25% of their sales on wasteful
practices, service companies up to 40%. In certain vertical markets (e.g., the public sector) data quality
is not an option but a constraint for the proper operation of the data warehouse. Thus, data quality
problems seem to introduce even more complexity and computational burden to the loading of the data
warehouse. In the DWQ project (Foundations of Data Warehouse Quality [30]), we have attacked the
problem of quality-oriented design and administration in a formal way, without sacrificing optimization
and practical exploitation of our research results. In this subsection we summarize our results as far as
needed for this paper.
The Metamodel of Data Warehouse Operational Processes
We start the presentation of the metamodel for data warehouse operational processes from the
logical perspective, to show how the metamodel deals with the requirements of structure complexity
and capturing of data semantics in the next two sections. Then, in subsections 3.3 and 3.4 we present
the physical and the conceptual perspectives. In the former, the requirement of trace logging will be
fulfilled too.