14-02-2013, 11:59 AM
DATA WAREHOUSE
DATA WAREHOUSE.doc (Size: 521.5 KB / Downloads: 21)
INTRODUCTION
A data warehouse is the main repository of the organization's historical data, its corporate memory. For example, an organization would use the information that's stored in its data warehouse to find out what day of the week they sold the most widgets in May 1992, or how employee sick leave the week before the winter break differed between California and New York from 2001-2005. In other words, the data warehouse contains the raw material for management's decision support system. The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis on the information without slowing down the operational systems. While operational systems are optimized for simplicity and speed of modification (online transaction processing, or OLTP) through heavy use of database normalization and an entity-relationship model, the data warehouse is optimized for reporting and analysis (on line analytical processing, or OLAP). Frequently data in data warehouses is heavily denormalised, summarised and/or stored in a dimension-based model but this is not always required to achieve acceptable query response times.
DATA STORAGE METHOTS
In OLTP - online transaction processing systems relational database design use the discipline of data modeling and generally follow the Codd rules of data normalization in order to ensure absolute data integrity. Less complex information is broken down into its most simple structures (a table) where all of the individual atomic level elements relate to each other and satisfy the normalization rules. Codd defines 5 increasing stringent rules of normalization and typically OLTP systems achieve a 3rd level normalization. Fully normalized OLTP database designs often result in having information from a business transaction stored in dozens to hundreds of tables. Relational database managers are efficient at managing the relationships between tables and result in very fast insert/update performance because only a little bit of data is affected in each relational transaction.
OLTP databases are efficient because they are typically only dealing with the information around a single transaction. In reporting and analysis, thousands to billions of transactions may need to be reassembled imposing a huge workload on the relational database. Given enough time the software can usually return the requested results, but because of the negative performance impact on the machine and all of its hosted applications, data warehousing professionals recommend that reporting databases be physically separated from the OLTP database.
In addition, data warehousing suggests that data be restructured and reformatted to facilitate query and analysis by novice users. OLTP databases are designed to provide good performance by rigidly defined applications built by programmers fluent in the constraints and conventions of the technology. Add in frequent enhancements, and to many a database is just a collection of cryptic names, seemingly unrelated and obscure structures that store data using incomprehensible coding schemes. All factors that while improving performance, complicate use by untrained people. Lastly, the data warehouse needs to support high volumes of data gathered over extended periods of time and are subject to complex queries and need to accommodate formats and definitions of inherited from independently designed package and legacy systems.
Concept
Information processing and managing a database are the two important components for any business to have a smooth operation. Data warehousing is a concept where the information systems are computerized. Since there would be a lot of applications that run simultaneously, there is a possibility that each individual processes create an exclusive “secondary data” which originates from the source. The data warehouses are useful in tracking all the information down and are useful in analyzing this information and improve performance. They offer a wide variety of options and are highly compatible to virtually all working environments. They help the managers of companies to gauge the progress that is made by the company over a period of time and also explore new ways to improve the growth of the company. There are many “it’s” in business and these data warehouses are read only integrated databases that help to answer these questions. They are useful to form a structure of operations and analyze the subject matter on a given time period.
The structure
As is the case with all computer applications there are various steps that are involved in planning a data warehouse. The need is analyzed and most of the time the end user is taken into consideration and their input forms an invaluable asset in building a customized database. The business requirements are analyzed and the “need” is discovered. That would then become the focus area. If a company wants to analyze all its records and use the research in improving performance.
A data warehouse allows the manager to focus on this area. After the need is zeroed in on then a conceptual data model is designed. This model is then used a basic structure that companies follow to build a physical database design. A number of iterations, technical decisions and prototypes are formulated. Then the systems development life cycle of design, development, implementation and support begins.
Collection of data
The project team analyzes various kinds of data that need to go into the database and also where they can find all this information that they can use to build the database. There are two different kinds of data. One which can be found internally in the company and the other is the data that comes from another source. There would be another team of professionals who would work on the creation, extraction programs that are used to collect all the information that is needed from a number of databases, Files or legacy systems. They identify these sources and ten copy them onto a staging area outside the database. They clean all the data which is described as cleansing and make sure that it does not contain any errors. They copy all the data into his data warehouse. This concept of data extraction from the source and the selection, transformation processes have been unique benchmarks of this concept. This is very important for the project to become successful. A lot of meticulous planning is involved in arriving at a step by step configuration of all the data from the source to the data warehouse.