Once certification of a set of data loads or controlled batches is complete, take image copies to re-baseline the database. Similarly, source systems should take image copies of their data to facilitate reconciliation. Quality and integrity issues are often found at this point, and restoring back to a certified copy will enable more efficient testing.
One of the key questions to ask regarding these findings is whether to repair the data or reload. If a data anomaly is the result of a flaw in the ETL logic, then a business typically should make the ETL change and reload the data or run a script to correct it. Investigate the source system for database schema changes. Once the database structure changes, analyze the impact of the changes and modify any impacted tests.
While your response will depend on the degree of severity, it is good practice to make a determination up front as to the general rules regarding such events—another reason to have data stewardship.
Testing that the data has transformed according to specifications and business rules can be complicated. Depending on how well an application was designed, data can go through several transformations in the transaction lifecycle. Once initial testing is completed, create a first load of data for validation testing. This is necessary, as the reports that the end-users will use to validate will contain the same data. It is prudent to consider the need for several years of data, so you should decide how far back you should go. Play close attention to year-end processing and reporting, as there could be some skeletons in those closets, as well.
How Much Data?
There are many different answers to this question. One is to approach the level of data under test that will be used for the BI process as the end goal. It is the data supporting the entire enterprise. Whatever the grouping will be—line of business, portfolio, business critical, etc.—start with a baseline of data, know the business’s relationships, and build tests that control complexity, variables, and size. Also, try to obtain production reports from the source systems to validate baseline data and facilitate comparison.