07-07-2014, 02:02 PM
Warehousing
Warehousing.ppt (Size: 267 KB / Downloads: 14)
Introduction
What is Data Warehouse?
A data warehouse is a collection of integrated databases designed to support a DSS.
According to Inmon’s (father of data warehousing) definition(Inmon,1992a,p.5):
It is a collection of integrated, subject-oriented databases designed to support the DSS function, where each unit of data is non-volatile and relevant to some moment in time.
What is a Data Warehouse?A Practitioners Viewpoint
A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.”
-- Barry Devlin, IBM Consultant
Benefits of Data Warehouse
Maintain data history, even if the source transaction systems do not.
Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data.
Restructure the data so that it makes sense to the business users.
Disadvantages of Query-Driven Approach
Delay in query processing
Slow or unavailable information sources
Complex filtering and integration
Inefficient and potentially expensive for frequent queries
Competes with local processing at sources
Hasn’t caught on in industry
Extraction Transformation and Loading
Data Extraction – The data contained in the data warehouse is extracted from the organization operational store. The data in the data warehouse can also be obtained from various external sources. Sources such as flat files, html, xml documents, text etc.
Query Cache
Query cache will keep record of recently executed queries. Query cache will also be responsible for keeping result of recently executed queries. The primary goal is to make the system intelligence at the warehouse level, so that system will remember the recent work it has performed. The cache is maintained at the warehouse level and contains a tuple (Q, QR)[15].