18-05-2012, 04:28 PM
Data Warehouse / ETL Testing: Best Practices
Anandiya et al - Best Practices in data warehouse testing.pdf (Size: 53.11 KB / Downloads: 79)
Abstract
One of the greatest risks to success any company implementing a business intelligence
system can make is rushing a data warehouse into service without testing it effectively.
Even wise IT managers, who follow the Old Russian proverb, "trust, but verify," need to,
maintain their vigilance. There are pitfalls in the testing process, too.
Many organizations create test plans and assume that testing is over when every single test
condition passes their expected results. In reality, this is a very difficult bar to meet. Often,
some requirements turn out to be unattainable when tested against production information.
Some business rules turn out to be false or incredibly more complex than originally thought.
Data warehousing applications keep on changing with changing requirements. This white paper shares
some of the best practices of the experiences of data warehouse testing.
“If you torture data sufficiently, it will confess to almost anything.”
1 About Data Warehousing Testing
We know how critical the data is in a data warehouse when it integrates data from
different sources. For example, in the healthcare industry, it helps users to answer
business questions about physicians, plan the performance, market share and geographic
variations in clinical practice, health outcomes etc. Thus if the data is so sensitive, critical
and vast, we can understand how much challenging it would be. Thus this is a menial
effort to write about some of the best practices we learned while doing it on ground to
share it with others.
How much confident a company can be to implement its data warehouse in the market
without actually testing it thoroughly. The organizations gain the real confidence once the
data warehouse is verified and validated by the independent group of experts known as
“Data warehouse testers”.
2 Need for Data Warehouse testing: Best Practices
As we all know that a data warehouse is the main repository of any organization's
historical data. It contains the material and information for management's decision
support system.
Most of the organization runs their businesses on the basis of collection of data for
strategic decision- making. To take a competitive edge the organization should have the
ability to review historical trends and monitor real-time functional data.
3 Data warehousing testing phases
While implementing the best practices at our testing we follow the various phases in our
data warehouse testing. They are:
1) Business understanding
a. High Level Test Approach
b. Test Estimation
c. Review Business Specification
d. Attend Business Specification and Technical Specification
walkthroughs
2) Test plan creation, review and walkthrough
3) Test case creation, review and walkthrough
4) Test Bed & Environment setup
5) Receiving test data file from the developers
6) Test predictions creation, review (Setting up the expected results)
7) Test case execution and (regression testing if required).
a. Comparing the predictions with the actual results by testing the
business rules in the test environment.
b. Displaying the compare result in the separate worksheet.
8) Deployment
a. Validating the business rule in the production environment.
What is ETL?
ETL stands for extract, transform, and load. It can consolidate the scattered data for any
organization while working with different departments. It can very well handle the data
coming from different departments.
For example, a health insurance organization might have information on a customer in
several departments and each department might have that customer's information listed in
a different way. The membership department might list the customer by name, whereas
the claims department might list the customer by number. ETL can bundle all this data
and consolidate it into a uniform presentation, such as for storing in a database or data
warehouse.
Data Validation
• We say we have achieved the quality when we successfully fulfill customer’s
requirements. In other words we basically achieve a value for our customer. Since
in data warehouse testing; the test execution revolves around the data, so it is
important to achieve the degree of excellence for the data and for that we do the
data validation for both the data extracted from the source and then getting loaded
at the table.