10-06-2014, 11:24 AM
DATA Ware housing :A multitiered Architecture:-
DATA Ware housing.docx (Size: 488.96 KB / Downloads: 17)
INTRODUCTION
The bottom tier is a warehouse database server that is almost always a relational database system. Back-end tools and utilities are used to feed data into the bottom tier from operational databases or other external sources .These tools and utilities perform data extraction, cleaning, and transformation sources, as well as load and refresh functions to update the data warehouse. The data are extracted using application program interfaces known as gateways. A gateway is supported by the underlying DBMS and allows client programs to generate SQL code to be executed at a server.
Examples of gateways include ODBC (Open Database Connection) and OLEDB (Open Linking and Embedding for Databases) by Microsoft and JDBC (Java Database Connection). This tier also contains a metadata repository, which stores information about the data warehouse and its contents.
The metadata repository is further described.
The middle tier is an OLAP server that is typically implemented using either
(1) a relational OLAP (ROLAP) model, that is, an extended relational DBMS that maps operations on multidimensional data to standard relational operations; or
(2) a multidimensional OLAP (MOLAP) model, that is, a special-purpose server that directly implements multidimensional data and operations.
Enterprise warehouse:
An enterprise warehouse collects all of the information about subjects spanning the entire organization. It provides corporate-wide data integration, usually from one or more operational systems or external information
providers, and is cross-functional in scope. It typically contains detailed data as well as summarized data, and can range in size from a few gigabytes to hundreds of gigabytes, terabytes, or beyond. An enterprise data warehouse may be implemented on traditional mainframes, computer superservers, or parallel architecture platforms. It requires extensive business modeling and may take years to design and build.
Data mart:
A data mart contains a subset of corporate-wide data that is of value to a specific group of users. The scope is confined to specific selected subjects. For example, a marketing data mart may confine its subjects to customer, item, and sales. The data contained in data marts tend to be summarized.Data marts are usually implemented on low-cost departmental servers that are UNIX/LINUX- or Windows-based. The implementation cycle of a data mart is more likely to be measured in weeks rather than months or years. However, it may involve complex integration in the long run if its design and planning werenot enterprise-wide.
Depending on the source of data, data marts can be categorized as independent or dependent. Independent data marts are sourced fromdata captured fromone or more operational systems or external information providers, or fromdata generated locally within a particular department or geographic area
Virtual warehouse:
A virtual warehouse is a set of views over operational databases. For efficient query processing, only some of the possible summary views may be materialized. A virtual warehouse is easy to build but requires excess capacity on operational database servers. The pros and cons of the top-down and bottom-up approaches to data warehouse
Development.The top-down development of an enterprise warehouse serves as a systematic solution and minimizes integration problems
Data cube:- A multidimensional data model
A data cube allows data to be modeled and viewed in multiple dimensions. It is defined by dimensions and facts.
dimensions are the perspectives or entities with respect to which an organization wants to keep records.to the dimensions time, item, branch, and location.. Each dimension may have a table associated with it, called a dimension table, which further describes the dimension.
For example, a dimension table for item may contain the attributes item name, brand, and type. Dimension tables can be specified by users or experts, or automatically generated and adjusted based on data distributions. Facts are numerical measures.Think of themas the quantities by which we want to analyze relationships between dimensions.
A sales data warehouse include dollars sold (sales amount in dollars), units sold (number of units sold), and amount budgeted. To view our sales data with an additional fourth dimension, such as supplier.
N=Dimensional data as a series of (n-1)-Dimensional “cubes.”
Each cuboid represents a different degree of summarization.
The result would forma lattice of cuboids, each showing the data at a different level of summarization, or group by. A data cube forthe dimensions time, item, location, and supplier.