04-09-2012, 11:17 AM
data warehouse
Udata warehouse.docx (Size: 25.46 KB / Downloads: 27)
In computing, a data warehouse (DW or DWH) is a database used for reporting and analysis. The data stored in the warehouse are uploaded from the operational systems (such as marketplace, sales etc., shown in the figure to the right). The data may pass through an operational data store for additional operations before they are used in the DW for reporting.
The typical ETL-based data warehouse uses staging, integration, and access layers to house its key functions. The staging layer or staging database stores raw data extracted from each of the disparate source data systems. The integration layer integrates the disparate data sets by transforming the data from the staging layer often storing this transformed data in an operational data store (ODS) database. The integrated data is then moved to yet another database, often called the data warehouse database, where the data is arranged into hierarchal groups often called dimensions and into facts and aggregate facts. The combination of facts and dimensions is sometimes called a star schema.
The access layer helps users retrieve data.[1]
A data warehouse constructed from integrated data source systems does not require ETL, staging databases, or operational data store databases. The integrated data source systems may be considered to be a part of a distributed operational data store layer. Data federation methods or data virtualizationmethods may be used to access the distributed integrated source data systems to consolidate and aggregate data directly into the data warehouse database tables. Unlike the ETL-based data warehouse, the integrated source data systems and the data warehouse are all integrated since there is no transformation of dimensional or reference data. This integrated data warehouse architecture supports the drill down from the aggregate data of the data warehouse to the transactional data of the integrated source data systems.
Benefits of a data warehouse
A data warehouse maintains a copy of information from the source transaction systems. This architectural complexity provides the opportunity to:
1 Maintain data history, even if the source transaction systems do not.
2 Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger.
3 Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data.
4 Present the organization's information consistently.
5 Provide a single common data model for all data of interest regardless of the data's source.
6 Restructure the data so that it makes sense to the business users.
7 Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems.
Add value to operational business applications, notably customer relationship management (CRM) systems.
History
The concept of data warehousing dates back to the late 1980s[2] when IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse". In essence, the data warehousing concept was intended to provide an architectural model for the flow of data from operational systems to decision support environments. The concept attempted to address the various problems associated with this flow, mainly the high costs associated with it. In the absence of a data warehousing architecture, an enormous amount of redundancy was required to support multiple decision support environments. In larger corporations it was typical for multiple decision support environments to operate independently. Though each environment served different users, they often required much of the same stored data. The process of gathering, cleaning and integrating data from various sources, usually from long-term existing operational systems (usually referred to aslegacy systems), was typically in part replicated for each environment. Moreover, the operational systems were frequently reexamined as new decision support requirements emerged. Often new requirements necessitated gathering, cleaning and integrating new data from "data marts" that were tailored for ready access by users.
Udata warehouse.docx (Size: 25.46 KB / Downloads: 27)
In computing, a data warehouse (DW or DWH) is a database used for reporting and analysis. The data stored in the warehouse are uploaded from the operational systems (such as marketplace, sales etc., shown in the figure to the right). The data may pass through an operational data store for additional operations before they are used in the DW for reporting.
The typical ETL-based data warehouse uses staging, integration, and access layers to house its key functions. The staging layer or staging database stores raw data extracted from each of the disparate source data systems. The integration layer integrates the disparate data sets by transforming the data from the staging layer often storing this transformed data in an operational data store (ODS) database. The integrated data is then moved to yet another database, often called the data warehouse database, where the data is arranged into hierarchal groups often called dimensions and into facts and aggregate facts. The combination of facts and dimensions is sometimes called a star schema.
The access layer helps users retrieve data.[1]
A data warehouse constructed from integrated data source systems does not require ETL, staging databases, or operational data store databases. The integrated data source systems may be considered to be a part of a distributed operational data store layer. Data federation methods or data virtualizationmethods may be used to access the distributed integrated source data systems to consolidate and aggregate data directly into the data warehouse database tables. Unlike the ETL-based data warehouse, the integrated source data systems and the data warehouse are all integrated since there is no transformation of dimensional or reference data. This integrated data warehouse architecture supports the drill down from the aggregate data of the data warehouse to the transactional data of the integrated source data systems.
Benefits of a data warehouse
A data warehouse maintains a copy of information from the source transaction systems. This architectural complexity provides the opportunity to:
1 Maintain data history, even if the source transaction systems do not.
2 Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger.
3 Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data.
4 Present the organization's information consistently.
5 Provide a single common data model for all data of interest regardless of the data's source.
6 Restructure the data so that it makes sense to the business users.
7 Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems.
Add value to operational business applications, notably customer relationship management (CRM) systems.
History
The concept of data warehousing dates back to the late 1980s[2] when IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse". In essence, the data warehousing concept was intended to provide an architectural model for the flow of data from operational systems to decision support environments. The concept attempted to address the various problems associated with this flow, mainly the high costs associated with it. In the absence of a data warehousing architecture, an enormous amount of redundancy was required to support multiple decision support environments. In larger corporations it was typical for multiple decision support environments to operate independently. Though each environment served different users, they often required much of the same stored data. The process of gathering, cleaning and integrating data from various sources, usually from long-term existing operational systems (usually referred to aslegacy systems), was typically in part replicated for each environment. Moreover, the operational systems were frequently reexamined as new decision support requirements emerged. Often new requirements necessitated gathering, cleaning and integrating new data from "data marts" that were tailored for ready access by users.