Building data warehouses usingopen source technologies ppt

**seminar flower** · 06-09-2012, 03:29 PM

Building data warehouses usingopen source technologies

.pdf

Building data.pdf (Size: 956.19 KB / Downloads: 76)

Introduction

This article is about building data warehouses. A data warehouse is a computer database that collects, integrates and stores an organization's data with the aim of producing accurate and timely management information and supporting data analysis1. It explains the importance of a good data warehouse and cover the process of building such a specialized database using open source technologies.

Intended audience

This article is meant for software developers, database administrators, integrated software vendors or other people who are facing the challenge of making the analysis of large amounts of data generated by a business or an information system possible. It has been written with the assumption of a basic understanding of the covered concepts like databases, information systems and business transactions in mind.

Why build a data warehouse

There are many reasons to justify building a data warehouse, but almost all of them boil down to the same basic wish: provide means for analysis of data to support management decisions. You are probably already providing management with data like site usage statistics, referrer trends and the number of registered users of their information system. This is basic information, which can be retrieved directly from the information system itself. However, this has some important drawbacks.
First of all, since all of this data is retrieved directly from the information system, it places a considerable burden on this system. Analysis often requires huge amounts of data to be processed, which is often a problem if, for instance, the system's database tables get locked for retrieval. A data warehouse can be completely detached from the information system, even running on a different system.
Secondly, an OLTP2system's data model is rarely optimized for analysis. We all learn to develop systems to use a normalized database modeled after our entity relationships. This is a good thing for the information system, since the underlying model is a close reflection of the system itself, but it makes querying for large sums of aggregated data a costly operation. Furthermore, redundancy is rarely part of this database design, because redundancy is hard to maintain, often causing data inconsistencies or worse. For data analysis, redundancy can be great, because it speeds up the process.

Building a data warehouse

In this chapter, I will guide you through the different phases in building a data warehouse. I will illustrate this by using a simplified webbased information system as an example for creating a data warehouse. This system contains familiar entities such as “requests”, “users” and “pages”. The data model of this system is shown in figure <TODO>.

Designing a dimensional data warehouse

Asking questions

The most important step in building a data warehouse is designing it. You have to ask yourself: “What does the management want to know?”. First, you'll have to figure out which questions need to be answered by analysis of the data warehouse to.be3. For example, is there a correlation between users of a webbased system and the pages it provides? Do certain groups of users visit other pages than others? Where do the visitors from different pages come from? My belief is at least 80% of all management questions about the data generated by an information system can be answered by a decent data warehouse.
Since you are reading this, I assume your company's management has already formulated this kind of questions. If not, make sure they do. Don't think too lightly about this phase, as it is the basis of what the resulting data warehouse can do.

Modeling structures

Once you get a good view on what questions you'll want to ask the data warehouse later on, you have to determine the different dataarrangements that come with these questions. For example, a question about sales will have to operate on a different structure than one about employment. These structures are called “cubes” in the world of OLAP4, because they are essentially an extension to the twodimensional array of data that is stored in a conventional (for example SQL) database. Depending on the data, a cube may have more than two or even three dimensions. We'll model these cubes in a relational database as a star schema, containing a single fact table linking together multiple dimension tables.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	data mining full report	project report tiger	37	374,184,749	16-03-2019, 05:22 PM Last Post: TitkinWY
	A Novel Data Embedding Method Using Adaptive Pixel Pair Matching Report	project girl	3	4,489	15-01-2018, 01:56 PM Last Post: dhanabhagya
	Detecting False Data in Wireless Sensor Network using Efficient Becan Scheme	seminar tips	1	3,235	20-09-2017, 01:03 PM Last Post: jaseela123
	Different Initialization Data and the Performance by the BFM	seminar flower	1	680	20-09-2017, 12:44 PM Last Post: jaseela123
	Wide Area Mobile Data Services	seminar ideas	1	2,373	19-09-2017, 02:35 PM Last Post: jaseela123
	ppt on ONLINE AUCTION	project girl	1	1,881	19-09-2017, 09:49 AM Last Post: jaseela123
	Integrating and Designing the Data Mining Technique System Based on Customer	seminar projects maker	1	782	15-09-2017, 02:45 PM Last Post: jaseela123
	Development of Open Source GIS Application For Visualization of Water Resource Inform	project maker	1	604	15-09-2017, 01:02 PM Last Post: jaseela123
	Programming Paradigm ppt	seminar paper	1	2,437	15-09-2017, 11:13 AM Last Post: jaseela123
	Survey of Privacy Protection for Medical Data	project maker	1	649	13-09-2017, 01:14 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.