23-04-2012, 04:50 PM
Experiences with Real-Time Data Warehousing Using Oracle Database 10G
Real-Time Data [email protected] (Size: 311 KB / Downloads: 104)
My Background
An independent data warehousing consultant specializing in the dimensional approach to data warehouse / data mart design and implementation with in-depth experience utilizing efficient, scalable techniques whether dealing with large-scale data warehouses or small-scale, platform constrained data mart implementations. I deliver dimensional design and implementation as well as ETL workshops in the U.S. and Europe.
I have helped implement data warehouses using Redbrick, Oracle, Teradata, DB2, Informix, and SQL Server on mainframe, UNIX, and NT platforms, working with small and large businesses across a variety of industries including such customers as Hewlett Packard, American Express, General Mills, AT&T, Bell South, MCI, Oracle Slovakia, J.D. Power and Associates, Mobil Oil, The Health Alliance of Greater Cincinnati, and the French Railroad SNCF
Real-Time in Data Warehousing
Data Warehousing Systems are complex environments
Business rules
Various data process flows and dependencies
Almost never pure Real-Time
Some latency is a given
What do you need?
Real Time
Near Real-Time
Just in Time for the business
Customer Business Scenario
Client provides software solutions for utility companies
Utility companies have plants generating energy supply
Recommended maximum output capacity
Reserve Capacity
Buy supplemental energy as needed
Peak demand periods are somewhat predictable
Each day is pre-planned on historical behavior
Cheaper to buy energy ahead
Expensive to have unused capacity
Existing data warehouse supports the planning function
Reduced option expenses
Cut down of supplemental energy costs
Our Real-Time SolutionChange Capture and Population
Incremental change capture from operational site
Synchronous or Asynchronous
Transformation and Propagation (population) of change data to the DW
Continuous trickle feed or periodic batch
The Continuous Feed
Put an insert trigger on the change table which joins to the dimension tables picking up the dimension keys and does any necessary transformations
Summary
We created a real-time partition for current day activity
We put CDC on the operational table and created a change table populated by an asynchronous process (reads redo log)
We demonstrated continuous feed to the DW by using a trigger based approach
We demonstrated a batch DW feed by using the CDC subscribe process
We showed how to add the current day table to the fact table and set up the next days table
An electronic copy of the SQL used to build this prototype is available by emailing mike.schmitz[at]databaseperormance.com