12-08-2014, 09:59 AM
Data Mining: Concepts and Techniques
Data Mining.ppt (Size: 658.5 KB / Downloads: 7)
What is a Data Warehouse?
Defined in many different ways, but not rigorously.
A decision support database that is maintained separately from the organization’s operational database
Support information processing by providing a solid platform of consolidated, historical data for analysis.
“A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.”—W. H. Inmon
Data warehousing:
The process of constructing and using data warehouses
Data Warehouse—Integrated
Constructed by integrating multiple, heterogeneous data sources
relational databases, flat files, on-line transaction records
Data cleaning and data integration techniques are applied.
Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources
E.g., Hotel price: currency, tax, breakfast covered, etc.
When data is moved to the warehouse, it is converted.
Enterprise warehouse
collects all of the information about subjects spanning the entire organization
Data Mart
a subset of corporate-wide data that is of value to a specific groups of users. Its scope is confined to specific, selected groups, such as marketing data mart
Independent vs. dependent (directly from warehouse) data mart
Virtual warehouse
A set of views over operational databases
Only some of the possible summary views may be materialized
From Tables and Spreadsheets to Data Cubes
A data warehouse is based on a multidimensional data model which views data in the form of a data cube
A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions
Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year)
Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables
In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.
: Data Warehousing and On-line Analytical Processing
Data Warehouse: Basic Concepts
Data Warehouse Modeling: Data Cube and OLAP
Data Warehouse Design and Usage
Data Warehouse Implementation
Data Generalization by Attribute-Oriented Induction
Summary